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Foreword 


The State Key Laboratory of Genetic Resources and Evolution, located at the Kunming Institute of Zoology, 
Chinese Academy of Sciences (CAS), is a leading research center in genetic resource conservation and evolutionary 
biology. In this special issue of Zoological Research on “Animal Genetic diversity, development and evolution", we 
have compiled 12 review and research articles from the Key Lab as well as 4 related articles from external authors, 
covering, among other topics, genetic biodiversity, molecular phylogeny, evolution of gene families, and the use of 
mitochondrial DNA in the study of adaptive evolution and tumor evolution. 

Based on previous cytogenetic, morphological and molecular data, Hu JY et al summarized the higher-level 
phylogeny of Laurasiatheria, one of the richest and most diverse superorders of placental mammals. The genus 
Trachypithecus is the most diverse langur taxon, distributed in southwestern China and south and southeastern Asia. He 
K et al reconstructed the phylogeny of the 16 recognized Trachypithecus species using mitochondrial and nuclear genes. 
Wang CY et al reported on the identification of the processed medicinal insect Catharsius molossus by means of DNA 
barcoding, hopefully stimulating similar studies. Wang F et al cloned the heat shock protein 60 cDNA of Neobenedenia 
melleni (a capsalid monogenean parasite on fish skin) and studied its expression change under different temperatures 
and salinity. Wang JH et al and Wang XA et al respectively discussed the establishment of fibroblast-like cell lines of 
the endangered Crested ibis (Nipponia nippon) and the local fish Anabarilius grahami (Cypriniformes: Cyprinidae). 

Due to its rapid mutation ratio, mitochondrial DNA has been widely used in molecular evolutionary studies. In this 
issue, Chen X et al reviewed recent progress on its applications in molecular phylogeny, evolution of energy 
metabolism and adaptation. Different from normal cells, tumor cells generate energy via glycolysis even under aerobic 
conditions. In tumor cells, somatic mutations on the energy metabolism related genes (including mitochondrial ones) 
occur at a high frequency. Whether this is involved in the switch of the energy metabolism pathways is examined by 
Liu J and Kong QP. Meanwhile, Shi QH et al reported on the complete mitochondrial genome of the Painted Jezebel, 
Delias hyparete. 

During conjugation of the protozoa Paramecium caudatum, four haploid nuclei formed during the first two meiotic 
prezygotic divisions, of which only one survived. In the study by Wang YW et al, cytoplasmic microtubules (cMTs) 
were observed surrounding the surviving nuclei during the third prezygotic division, providing a new clue to the 
mechanism of its survival. The regulation of mRNA stability through RNA binding proteins plays an important role in 
embryonic development. In this issue, Xia YJ et al also showed that XZFP36L1, an RNA binding protein, is involved in 
Xenopus neural development. XZFP36L1 is expressed in specific brain regions in Xenopus embryos, and its 


overexpression inhibits neural differentiation and can lead to severe neural tube defect. 

Ye DD et al reviewed the bioinformatics pipelines for metagenomic analysis and Zhang & Su provided validation 
for an approach for peak identification for ChIP-seq data with no controls. This issue also includes reviews on the 
evolution of neurotransmitter gamma-aminobutyric acid, glutamate and their receptors; advances in the study of the 
nucleolus; and current progress on the study of ovarian germ stem cells in mammals. 
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b y3 Associate Editor-in-Chief 


(State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, the Chinese Academy of Sciences, Kunming 650223, China) 
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Introduction to the State Key Laboratory of Genetic 
Resources and Evolution 


The State Key Laboratory of Genetic Resources and Evolution, located at 
Kunming Institute of Zoology, Chinese Academy of Sciences (CAS), was established 
in 2007 on the basis of the previous "Key laboratory of Cellular and Molecular 
Evolution" of Chinese Academy of Sciences, which was established in 1990. The 


director of the key lab is academician Yaping ZHANG, and the director of the 





academic committee is academician Zuoyan ZHU. In 2011, the lab was evaluated as 
"excellent" by MOST and the lab was also awarded as one of the Outstanding Team on the Innovation Culture 
Construction of Chinese Academy of Sciences. 

The 3 major research areas of the key lab have been set to “Evolution and conservation of the diverse genetic 
resources", “Evolution of genes and genomes” and “Genetics, development and evolution". The lab has got world 
recognition in the research fields of genetic resource collection and conservation, human origin and migration, 
origin of primate cognition, origin of new genes and the genetic diversity of domestic animals etc. The aim of the 
key laboratory is to make the lab center of researches on genetic resources and evolution in China and also in the 
world in certain fields. 

The laboratory has 92 staff members, including one academician, 15 principal investigators and 10 associate 
professors. The lab currently has 15 research groups and 3 supporting units and is hosting about 100 graduate 
students. 

The lab is currently holding or participating in more than 200 research projects, including key projects from 
the National Basic Research Program (973), the National Natural Science Foundation of China and the CAS 
Innovation Program. Since 2006, the lab has published more than 400 papers indexed by SCI (Science Citation 
Index), some were on top journals including Nature and Science. Awards to the lab members include “Second 
Prize of National Natural Science Progress" (2), “First Prize of Yunnan Natural Science Progress” (3), 
“Biodiversity Leadership Award” (1), “Cheung Kong Scholar Achievements Awards" (1), etc. 

The lab is equipped with state-of-the-art instruments, including automated DNA sequencer, microarray 
scanner, GenomeLab SNPstream, high performance computing and cluster system, etc. The equipments of the lab 
valued more than RMB 20 million in total and the facilities are open to researchers from outside the lab. 

The lab promotes scientific exchange and cooperation with researchers both at home and abroad. Welcome to 


visit the lab homepage at: http://www.kiz.cas.cn/gre/ 
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Summary of Laurasiatheria (Mammalia) Phylogeny 


Jingyang HU '?, Yaping ZHANG '?, Li YU ^" 


1. Laboratory for Conservation and Utilization of Bio-resource, Yunnan University, Kunming 650091, China 
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Abstract: Laurasiatheria is one of the richest and most diverse superorders of placental mammals. Because this group had a rapid 


evolutionary radiation, the phylogenetic relationships among the six orders of Laurasiatheria remain a subject of heated debate and 


several issues related to its phylogeny remain open. Reconstructing the true phylogenetic relationships of Laurasiatheria is a 


significant case study in evolutionary biology due to the diversity of this suborder and such research will have significant 


implications for biodiversity conservation. We review the higher-level (inter-ordinal) phylogenies of Laurasiatheria based on 


previous cytogenetic, morphological and molecular data, and discuss the controversies of its phylogenetic relationship. This review 


aims to outline future researches on Laurasiatheria phylogeny and adaptive evolution. 


Keywords: Laurasiatheria; Phylogeny; Mitochondrial DNA; Nuclear genes; Phylogenomics 


Placental mammals, which  diverged from 
marsupials about 160 million years ago (Mya) (Luo et al, 
2011, Dos Reis et al, 2012), consist of four super-orders, 
Le. Afrotheria, Xenarthra, Euarchontoglires, and 
Laurasiatheria.(Asher & Helgen, 2010; Carter, 2001; 
Eizirik et al, 2001; Hallström et al, 2011; Kriegs et al, 
2006; Kulemzina et al, 2010; Lin et al, 2002; Murphy et 
al, 2001a,b; Murphy et al, 2007; Nikolaev et al, 2007; 
Nishihara et al, 2006; Prasad et al, 2008; Springer et al, 
2004; Wildman et al, 2006; Zhou & Yang, 2010; Zhou et 
al, 2011a). Among them, Laurasiatheria is one of the 
richest and most-diverse, with an origin and a primary 
distribution on the supercontinent of Laurasia (including 
North America, Europe and Asia) during their early 
history (Springer et al, 2011; Waddell et al, 1999b). 
Although the root of placental mammals is unclear, 
Laurasiatheria has been consistently recognized as a sub- 
clade of Boreoeutheria (=Laurasiatheria+Euarchontoglires) 
(Dos Reis et al, 2012; Gibson et al, 2005; Hallstróm & 
Janke, 2008; Song et al, 2012; Madsen et al, 2001; 
McCormack et al, 2012; Murphy et al, 2001a, b; 
Nishihara et al, 2006; Springer & Murphy, 2007; 
Springer et al, 2011; Wildman et al, 2006). Many 
Laurasiatheria species have attracted great interest in 
relation to the conservation of wild animals and are also 
crucial animal models for studies on adaptive evolution. 
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As a mammalian group bearing important evolutionary 
significance and conservation value, Laurasiatheria has 
long been a focus of researches. 

Current classifications of Laurasiatheria recognize 
six orders, namely Eulipotyphla (hedgehogs, shrews, and 
moles), Perissodactyla (rhinoceroses, horses, and tapirs), 
Carnivora (carnivores), Cetartiodactyla (artiodactyls and 
cetaceans), Chiroptera (bats), and Pholidota (pangolins) 
(Hallstróm et al, 2011; Lin et al, 2002; Waddell et al, 
1999a; Wildman et al, 2006; Wilson & Reeder, 2005; 
Zhou et al, 2011a). However, the phylogeny and 
evolution of Laurasiatheria remain. subjects of heated 
debate and are not yet well- established. The origins of 
Laurasiatheria have been dated to the Cretaceous, 
ranging from 100 to 78.5-93 mya according to several 
molecular estimates (Cao et al, 2000; Hallstróm & Janke 
2008, 2010; Hallstróm et al, 2011; Hasegawa et al, 2003; 
Ji et al, 2002; Kitazoe et al, 2007; Meredith et al, 2011; 
Murphy et al, 2004; Springer & Murphy, 2007; Zhou et 
al, 2011b). However, most morphological studies place 
the origins of this super-order in the Paleocene (Asher et 
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al, 2003; Wible et al, 2007, 2009; Luo et al, 2011). The 
diversification of | mammalian species within 
Laurasiatherian orders appear to have radiated within 
1-4 mya in the Eocene (Hallstróm & Janke 2008, 2010; 
Zhou et al, 2011b). So, attempts to clarify relationships 
among the six Laurasiatheria orders from a variety of 
studies have encountered serious challenges due to the 
rapid evolutionary radiations and recent speciation 
events. Although there has been a general consensus 
regarding the earliest divergence of the order 
Eulipotyphla (Dos Reis et al, 2012; Hallstróm et al, 2011; 


Lin et al, 2002; Murphy et al, 2001a, b; Nery et al, 2012; 
Nishihara et al, 2006; Romiguier et al, 2010; Song et al, 
2012; Wildman et al, 2006; Zhou et al, 2011a, b), 
conflicting phylogenetic hypotheses exist for the other 
orders that evolved subsequently (Figure 1). In this 
article, we review the higher-level (inter-ordinal) 
phylogenies of  Laurasiatheria based on previous 
cytogenetic, morphological and molecular data, and 
discuss the controversies of its phylogenetic relationship. 
This review aims to outline future research of 
Laurasiatheria phylogeny and adaptive evolution. 


A B C D 
Eulipotyphla Eulipotyphla Eulipotyphla Eulipotyphla 
Cetartiodactyla Chiroptera Chiroptera Chiroptera 
Perissodactyla Cetartiodactyla Carnivora Carnivora 
Chiroptera Perissodactyla Cetartiodactyla Cetartiodactyla 
Carnivora Carnivora Perissodactyla Perissodactyla 
Pholidota Pholidota 

E F G H 
Eulipotyphla Carnivora Eulipotyphla Eulipotyphla 
Chiroptera Perissodactyla Carnivora Chiroptera 
Pholidota Cetartiodactyla Pholidota Cetartiodactyla 
Carnivora Pholidota Chiroptera Perissodactyla 
Cetartiodactyla Chiroptera Cetartiodactyla Carnivora 
Perissodactyla Eulipotyphla Perissodactyla Pholidota 

! J : K l L Eulipotyphla 
Perissodactyla Eulipotyphla Eulipotyphla Chiroptera 
Eulipotyphla Cetartiodactyla Perissodactyla Cetartiodactyla 
Cetartiodactyla Perissodactyla Cetartiodactyla a 
Chiroptera Chiroptera Chiroptera Pholidota 
Carnivora Carnivora Carnivora Carnivora 
Pholidota Pholidota Pholidota 

M N [9] P 
Eulipotyphla Eulipotyphla Eulipotyphla Eulipotyphla 
Cetartiodactyla Carnivora Chiroptera Carnivora 
papara Chiroptera Perissodactyla Perissodactyla 

erissodactyla ; à ` 

Carnivora Perissodactyla Carnivora Chiroptera 
Pholidota Cetartiodactyla Cetartiodactyla Cetartiodactyla 


Figure 1 Sixteen different tree topologies proposed in previous studies 
A: Waddell & Shelley, 2003; Murphy et al, 2007; Kulemzina et al, 2010. B: Waddell et al, 1999a; Murphy et al, 2001b; Arnason et al, 2002; Arnason & Janke, 
2002; Amrine-Madsen et al, 2003; Springer et al, 2004; Wildman et al, 2006; Hallström et al, 2011; Song et al, 2012. C: Mouchaty et al, 2000; Nikaido et al, 
2001. D: Jow et al, 2002; Gibson et al, 2005. E: Hudelot et al, 2003. F: Madsen et al, 2001, Figurel A. G: Madsen et al, 2001, Figure1B; Meredith et al, 2011. H: 
Murphy et al, 2001a; Waddell et al, 2001; Beck et al, 2006; Springer et al, 2007; Bininda-Emonds et al, 2007; Zhou et al, 2011b. I: Madsen et al, 2002. J: Asher 
et al, 2003; Kullberg et al, 2006. K: Matthee et al, 2007. L: Kriegs et al, 2006; Nikolaev et al, 2007. M: Nishihara et al, 2006; Romiguier et al, 2010; 
McCormack et al, 2011; Zhou et al, 2011a; Dos Reis et al, 2012. N: Prasad et al, 2008, O: Prasad et al, 2008, P: Hou et al, 2009; Nery et al, 2012. 


Laurasiatheria phylogeny inferred from 
cytogenetic and morphological evidence 


Early karyotype studies of Laurasiatheria have 
suggested there are significant differences in 
chromosome number, morphology and banding pattern 
among the six orders (Ao et al, 2007; Frónicke et al, 
1997; Kulemzina et al, 2009; Nie et al, 2011; Sotero- 
Caio et al, 2011; Trifonov et al, 2008; Yang & 
Graphodatsky, 2004; Yang et al, 2006; Ye et al, 2006). 
Kulemzina et al (2010) investigated the phylogenetic 
relationships among Laurasiatheria orders by comparing 
their karyotypes with human karyotype, and proposed 
the Laurasiatheria phylogeny as (Eulipotyphla, 
(Cetartiodactyla, ((Perissodactyla, Chiroptera), (Carnivora, 
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Pholidota)))) (Figure 1A, Table 1). Their study therefore 
supports the basal placement of Eulipotyphla, and the 
close relationship between Carnivora and Pholidota, as 
well as that between Perissodactyla and Chiroptera. 
However, the trichotomy among Cetartiodactyla, 
Carnivora + Pholidota, and Perissodactyla + Chiroptera 
is unresolved. 

Compared with limited karyotype studies, numerous 
morphological studies have been conducted on 
Laurasiatheria phylogeny reaching a variety of conclusions. 
In early morphological studies, Laurasiatheria 
monophyly was not even supported (Asher et al, 2003; 
Novacek, 1992, 2001; Simpson, 1945; Wible et al, 2007, 
2009). For examples, Perissodactyla was originally 
classified into clade Ungulata (Shoshani & McKenna, 
1998), whereas Chiroptera was often associated with 
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Table 1 Major studies on phylogenetics of laurasiatherian groups 
S Main studies Data Data size Number of orders ca 
Cytogenetic — Kulemzina et al, 2010 6 37 
Shoshani and McKenna, 1998 260 morphological characteristics 
Asher et al, 2003 433morphological characteristics 
prec Wildman et al, 2006 13 placenta features 6 20 
Wilb et al, 2007 408 morphological characters 
Wilb et al, 2009 408 morphological characters 
Pumo et al, 1998 mt genome ~16kb 5(lack Pholidota) 12 
ATP6, ATP8, Co I,Co Il, Co 4 (lack Chiroptera and 
GRO IIL, Cytb,ND1,ND2,ND3,ND4,ND4L, NDS uL M Fulipotyphl) ? 
Waddell et al, 1999b mt genome, t RNA, ND6 ~1Mb * Mg das HW. Lg 
Cao et al, 2000 mt genome ~16kb 5(lack Pholidota) 17 
Mitochond- Arnason et al, 2002 mt genome, Cytb, 128 r RNA ~16 kb 6 25 
rial genes Lin et al, 2002 mt genome, r RNA, t RNA ~16kb 5 (lack Pholidota) 29 
Jow et al, 2002 mt genome, r RNA, t RNA ~16 kb 5 (lack Pholidota) 24 
Hudelot et al, 2003 rRNA, tRNA ~3.5 kb 6 31 
Kern & Kondrashov, 2004 t RNA 6 30 
Gibson et al, 2005 mt genome ~16kb 6 31 
Kjer et al, 2007 mt genome, r RNA, t RNA, ND6 ~1.4 Mb 6 42 
Madson et al, 2001 4 nu DNA(A2AB, IRBP, Vwf, BRCA1); mtr RNA ~5 kb 6 21 
15 nu DNA (4ADOR43,ADRB2,APP-3'UTR, ATP7A, BDNF, BMII- 
Murphy et al, 2001a 3'UTR, CNRI, CREM-3'UTR, EDG1, PLCB4-3'UTR, PNOC, RAGI, ~1 kb 6 24 
RAG2, TYR, ZFX); mt DNA 
Murphy et al, 2001b 19 nuDNA (Murphy et al, 2001a; Madsen et al,2001); mt DNA ~16 kb 6 20 
Waddell et al, 2001 19 nu DNA (Madson et al, 2001; Murphy et al, 2001b); mt DNA ~10 kb 6 20 
Eizirik et al, 2001 15 nuDNA (Murphy et al, 2001a); mtDNA ~1 Mb 6 24 
Madsen et al, 2002 1 nuDNA (A2AB) ^7] kb 6 19 
Asher et al, 2003 18 nuDNA(Murphy 2 her : Kd d 312001); mtDNA; — 17 kb 6 20 
Nuclear ^ Amrine-Madsen et al, 2003 20 nuDNA(Murphy et al, 2001b, apolipoprotein B) ~17kb 6 20 
genes Wanddell & Shelley, 2003 27nuDNA UR Ara e er ML ND6, / 20 kb 6 38 
Kulberg etal 2006 PEDI ATP synthase, citrate mas) — 73899 nd Pholidota) Y 
Matthee et al, 2007 introns (THY, PRKC, MGF) ~6 kb 6 114 
Asher et al, 2007 19 nuDNA and mt ames ADR NA morphological __ 17 kb 6 19 
Springer et al, 2007 20 nuDNA (Amrine M cl 175 morphological —14kb 6 2 
26 nuDNA (42AB, ApoB, BRCA, DMP1, ENAM, GHR, IRBP, Ragl, 
Meredith et al, 2011 vWF, TTN, CNRI, BCHE, EDGI, RAGI, RAG2, ATP7A, TYRI, ~2 Mb 6 69 
BDNF, ADRB2, APP, BMII, CREM, FBNI, PLCB4, ADORA3, PNOC) 
Bashir et al, 2005 1000 orthologous genes ~1.5 Mb 4 S end 4 
Kriegs et al, 2006 retroposed elements m um ) 15 
Nishihara et al, 2006 retroposon insertion (long interspersed elements; L1) loci 5 (lack Pholidota) 8 
Nikolaev et al, 2007 218 orthologous genes ~200 kb 4 pue ac 4 
Prasd et al, 2008 protein-coding sequence and non-coding sequence ~1.9Mb 5 (lack Pholidota) 13 
Hallstróm & Janke,2008 3012 orthologous genes ~280 kb £ mee pov 6 
Phylogeno: Hou et al, 2009 27705 orthologous genes 740 MB «(lack F holidota and 4 
mics Eulipotyphla ) 
Romiguier et al, 2010 1138 orthologous genes ~300kb 5 (lack Pholidota) 10 
Hallstróm et al, 2011 4775 orthologous genes ~6Mb 5 (lack Pholidota) 12 
McCormack et al, 2011 683 loci of ultraconserved elements 5 (lack Pholidota) 6 
Zhou et al, 201 1a 8 orthologous genes ~5 kb 6 11 
Zhou et al, 201 1b 97 orthologous genes ~46 kb 6 13 
Nery et al, 2012 3733 orthologous genes ~50Mb _ 5 (lack Pholidota) 6 
Dos Reis et al, 2012 857 orthologous genes and 153 mt genome ~2 Gb 5 (lack Pholidota) 153 
Song et al, 2012 447 orthologous genes —7Mb 5 (lack Pholidota) 11 
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Dermoptera (flying lemur) (Asher et al, 2003; Novacek, 
1992; Shoshani & McKenna, 1998; Simmons & Geisler, 
1998). Gunnell & Simmons (2005) proposed that the 
grouping of Chiroptera with Dermoptera was spurious 
due to the homoplasious morphological characters. 
Based on analyses of 13 placenta features, Wildman et al 
(2006) first supported that Laurasiatheria formed a 
monophyletic clade, which was confirmed by later 


molecular studies, and favored the  Laurasiatheria 
phylogeny as (Eulipotyphla, (Chiroptera, 
(Cetartiodactyla, (Perissodactyla, (Carnivora, 


Pholidota))))) (Figure 1B, Table 1). 

Similar to the karyotype studies, the earliest 
divergence of Eulipotyphla and the close association 
between Carnivora and Pholidota were supported. 
Regarding the phylogenetic placements of the other three 
orders, however, morphological studies demonstrated 
more resolutions and debates than the karyotype studies. 


Laurasiatheria phylogeny inferred from 


molecular evidence 


Phylogenetic estimate from mitochondrial genes 

Due to relatively small effective population size and 
lack of recombination as well as easy obtainment, 
mitochondrial DNA (mtDNA) has often been chosen as a 
preferred molecular marker in early molecular 
phylogenetics (Bargelloni et al, 2000; Brito & Edwards, 
2009; Phillips & Penny, 2003; Sanchez-Gracia & 
Castresana, 2012; Springer et al, 2001; Sturmbauer & 
Meyer, 1993; Yu et al, 2007), 

Pumo et al (1998) analyzed 12 complete mt genome 
sequences of five Laurasiatheria orders, placing 
Chiroptera as the sister group of Cetartiodactyla, 
Perissodactyla and Carnivora, while positioning 
Eulipotyphla as the most basal order of placental 
mammals, thus rejecting Laurasiatheria monophyly. Cao 
et al (1998) investigated the mt genome sequences of 
three orders, placing Cetartiodactyla as the sister group 
of Perissodactyla and Carnivora. By summarizing the 
results proposed in the “International Symposium on the 
Origin of Mammalian Orders”, Waddell et al (1999b) 
suggested the Laurasiatheria phylogeny as (Eulipotyphla, 
(Chiroptera, (Cetartiodactyla, (Perissodactyla, (Carnivora, 
Pholidota))))) (Figure 1B, Table 1). These relationships 
are consistent with those from the morphological study 
of Wildman et al (2006). Moreover, these relationships 
are also supported by later analysis of 13 important rare 
genomic changes (RGCs) (Springer et al, 2004). By 
adding the mtDNA genome sequences of other orders 
and by using different tree-building schemes, later 
studies obtained the same phylogenetic relationships as 
those from Waddell et al (1999b) and Wildman et al 
(2006) (Arnason et al, 2002; Arnason & Janke, 2002; 
Cao et al, 2000; Gibson et al, 2005; Kern & Kondrashov, 
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2004; Kjer & Honeycutt, 2007; Lin et al, 2002; Reyes et 
al, 2004) (Figure 1B, Table 1) with the exception of 
Eulipotyphla positioned as the most basal order of 
placental mammals, and Cetartiodactyla as the sister 
group of Perissodactyla and Carnivora (Waddell et al, 
1999a). 

Conversely, Nikaido et al (2001) and Mouchaty et al 
(2000) proposed that Eulipotyphla and Chiroptera were 
clustered together, which were closely related to 
Perissodactyla, Carnivora and Cetartiodactyla (Figure 1C, 
Table 1). Jow et al (2002) utilized all mt tRNAs and 
rRNAs to reconstruct the Laurasiatheria phylogeny 
among orders except for Pholidota. The resulting tree 
topology was (Eulipotyphla, (Chiroptera, (Carnivora, 
(Cetartiodactyla, Perissodactyla)))) (Figure 1D, Table 1). 
Subsequently, by adding the mt tRNAs and rRNAs of 
Pholidota, Hudelot et al (2003) claimed the 
Laurasiatheria phylogeny was (Eulipotyphla, (Chiroptera, 
(Pholidota, (Carnivora, (Cetartiodactyla, 
Perissodactyla))))) (Figure 1E, Table 1). Notably, their 
study rejected the general view of the clustering of 
Carnivora and Pholidota (Arnason et al, 2002; Arnason 
& Janke, 2002). Therefore, except for the early 
divergences of Eulipotyphla and Chiroptera, the 
phylogenetic relationships among the other four orders 
are inconsistent with all the previous studies. 


Phylogenetic estimate from nuclear genes 

Although analyses of mtDNA sequences have 
provided insights into the phylogeny and evolution of 
Laurasiatheria, the fact that all genes comprising mt 
genome are inherited as a single, haploid linkage unit is a 
well-known limitation on phylogenetic reconstruction 
because the resulting mt gene trees are unlikely to reflect 
one independent estimate of the species tree (Giannasi et 
al, 2001; Johnson & Clayton, 2000; Moore, 1995; Page, 
2000; Yu et al, 2004; Yu & Zhang, 2006a). Therefore, 
when much longer nuclear DNA sequences become 
widely available, nuclear genes received more and more 
attention in molecular phylogenetic studies, and 
demonstrated superiority in resolving deep phylogenies 
(Springer et al, 1999, 2001). Since 2001, a series of 
Laurasiatheria studies based on analyses of nuclear genes 
made important contributions to the resolution. of 
relationships among Laurasiatheria orders, entering into 
an unprecedented progress. 

Madsen et al (2001) first use three nuclear 
genes, combined with mt tRNA and rRNAs (-5 kb in 
total), to investigate Laurasiatheria relationships and 
proposed a tree topology of ((Carnivora, Perissodactyla), 
(Cetartiodactyla, (Pholidota, (Chiroptera, Eulipotyphla)))) 
(Figure IF, Table 1). When another nuclear gene (BRCA/) 
was added and the taxon of representatives was increased 
to test the stability of the resulting tree, the tree topology 
changed to (Eulipotyphla, ((Carnivora, Pholidota), 
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(Chiroptera, (Cetartiodactyla, Perissodactyla)))) (Figure 
1G, Table 1), indicating that Laurasiatheria phylogeny 
varied with the genes and species used in the analyses. 

In the same year by screening 15 nuclear 
genes (~10 kb) and three mt genes, Murphy et al (20012) 
supported a Laurasiatheria phylogeny of (Eulipotyphla, 
(Chiroptera, ((Cetartiodactyla, Perissodactyla), 
(Carnivora, Pholidota)))) (Figure 1H, Table 1). 
Interestingly, their study clustered Cetartiodactyla with 
Perissodactyla, though this relationship received low 
statistical support. The subsequent study of Waddell et al 
(2001) using 12 nuclear genes (~10 kb) and mt genome 
sequences retrieved the same Laurasiatheria phylogeny 
and similar nodal supports (Figure 1H, Table 1). These 
results were also supported by the analysis of 20 nuclear 
genes and 175 morphological characters by Springer et al 
(2007) (Figure 1H, Table 1). 

Remarkably, when the dataset is increased to 19 
nuclear genes and three mt genes (~16 kb), Murphy et al 
(2001b) showed that the Laurasiatheria phylogeny 
was (Eulipotyphla, (Chiroptera, (Cetartiodactyla, 
(Perissodactyla, (Carnivora, Pholidota)))) (Figure 1b 
and Table 1). Differences from other nuclear trees thus 
mainly lie in the phylogenetic positions of 
Cetartiodactyla and Perissodactyla. By adding a new 
nuclear marker (apolipoprotein B gene) to Murphy et al 
(2001b)’s dataset, Amrine-Madsen et al (2003) achieved 
the same phylogenetic results (Figure 1B, Table 1). 
These results were congruent with those from the 
morphological and mt genome analyses (Arnason et al, 
2002; Amason & Janke, 2002; Springer et al, 2004; 
Wildman et al, 2006). 

Madsen et al (2002) attempted to evaluate the utility 
of Alpha 2B adrenergic receptor gene in the 
Laurasiatheria phylogeny, which showed the tree as 
(Perissodactyla, ((Eulipotyphla, Cetartiodactyla), 
(Chiroptera, Carnivora, Pholidota))) (Figure 1I, Table 1). 
Surprisingly, Perissodactyla was placed as the basal and 
Eulipotyphla was the sister-group of Cetartiodactyla. 
Waddell & Shelley (2003) developed seven nuclear gene 
fragments and determined the Laurasiatheria phylogeny 
to be (Eulipotyphla, (Cetartiodactyla, ((Perissodactyla, 
Chiroptera), (Carnivora, Pholidota))) (Figure 1A, 
Table1). 

Asher et al (2003) examined 18 nuclear genes, in 
combination with three mt genes and 196 morphological 
characters, and presented the Laurasiatheria phylogeny 
as (Eulipotyphla, ((Cetartiodactyla, Perissodactyla), 
(Chiroptera, (Carnivora, Pholidota)))) (Figure 1J, Table 
1). Notably, Chiroptera was not placed at the second 
diverging lineage in Laurasiatheria, as suggested in most 
previous studies (Madsen et al, 2001; Murphy et al, 
2001b), but displayed a close relationship with Carnivora 
+ Pholidota. Using 8 housekeeping genes from four 
Laurasiatheria orders, Kullberg et al (2006) retrieved a 
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phylogeny of (Chiroptera, (Cetartiodactyla, (Perissodactyla, 
Carnivora))) (Figure 1J, Table 1). Despite weak support, 
this result was consistent with most mt phylogenies (Cao 
et al, 2000; Lin et al, 2002). 

Different from those protein-coding nuclear genes 
commonly used in previous studies, Matthee et al (2007) 
first utilize three nuclear introns (~6 kb) from 114 
Laurasiatheria species to rebuild the relationships among 
the six orders. Their study supported the phylogeny of 
(Eulipotyphla, (Perissodactyla, (Cetartiodactyla, 
(Chiroptera, (Carnivora, Pholidota))))) (Figure 1K, Table 
1). However, these relationships were poorly supported. 
Asher (2007) reanalyzed 19 nuclear and three mt 
sequences data from Murphy et al (2001b) as well as 196 
morphological characters from Asher et al (2003) (Table 
1). By incorporating indels of sequence data, the study 
suggested that the phylogenetic relationships among the 
Laurasiatheria orders changed with the combinations of 
characters and the analytic methods. 

Based on the above studies, Laurasiatheria 
phylogeny at the ordinal level remains an outstanding 
problem in mammalian systematics due to the 
contradicting conclusions reached under different 
datasets, especially in the case of the phylogenetic 
positions of Perissodactyla, Cetartiodactyla and Chiroptera. 


Phylogenomics of Laurasiatheria 

With the increasing availability of genomic data of 
more species, phylogenetic analysis, which use genomic 
data to infer evolutionary relationships, is entering a new 
era (Delsuc et al, 2005; Rokas & Chatzimanolis, 2008; 
Yu & Zhang, 2006a, b). By performing phylogenomic 
studies, a reliable phylogenetic tree can be reconstructed 
using many more characters than those in previous 
studies, including gene families, large insertion and 
deletion gene fragments, and gene rearrangement, etc. 
(Meslin et al, 2011; Wu et al, 2012; Yu et al, 2011). 
These characters might be useful for distinguishing 
nodes resulting from rapid radiation episodes such as the 
Laurasiatheria speciation events. 

Bashir et al (2005) attempted to reconstruct the 
Laurasiatheria phylogeny of four orders (Eulipotyphla, 
Cetartiodactyla, Perissodactyla, and Carnivora) based on 
more than 1000 orthologous repetitive elements obtained 
from publically available genome sequence data. In their 
study, Perissodactyla and Carnivora were grouped 
together, whereas the placements of the other two orders 
remained unclear. 

Using retroposed elements as new markers to 
reconstruct the Laurasiatheria phylogeny with the 
exception of Perissodactyla, Kriegs et al (2006) 
determined the tree as (Eulipotyphla, (Chiroptera, 
(Cetartiodactyla, (Pholidota, Carnivora)))) (Figure 1L, 
Table 1). On the other hand, Nishihara et al (2006) 
performed a comprehensive comparison of orthologous 
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retroposon insertion (long interspersed elements; L1) loci 
among five orders of Laurasiatheria (lacking Pholidota). 
These loci supported the phylogeny as (Eulipotyphla, 
(Cetartiodactyla, (Chiroptera, (Perissodactyla, 
Carnivora))) (Figure 1M, Table 1). Their most 
interesting finding was that Carnivora, Perissodactyla, 
and Chiroptera were grouped together, excluding 
Cetartiodactyla and Eulipotyphla, which has not been 
recovered by previous studies. 

Based on 218 protein-coding genes (~200 kb) 
obtained from the analysis of 18 placental mammalian 
genomes, Nikolaev et al (2007) suggested that 
Eulipotyphla diverged first, followed by Chiroptera, and. 
Cetartiodactyla and Carnivora were sister-group (Figure 
1L, Table 1). Subsequently, by searching for informative 
coding indels within whole-genome sequence data and 
amplifying them in Laurasiatheria species, Murphy et al 
(2007) yielded a new tree topology as (Eulipotyphla, 
(Cetartiodactyla, ((Chiroptera, Perissodactyla), 
(Carnivora, Pholidota)))) (Figure 1A, Table 1). The close 
relatedness of Chiroptera and Perissodactyla has been 
only recovered in karyotype research (Kulemzina et al, 
2010). 

Prasad et al (2008) reconstructed the phylogeny of 
Laurasiatheria except for Pholidota based on 1.9 Mb 
gene regions. The protein-coding sequence analyses 
supported the tree as (Eulipotyphla, (Carnivora, 
(Chiroptera, (Perissodactyla, Cetartiodactyla)))) (Figure 
IN, Table 1), whereas the combination of coding and 
non-coding sequence analyses favored the tree as 
(Eulipotyphla, (Chiroptera, (Perissodactyla, (Carnivora, 
Cetartiodactyla)))) (Figure 1O, Table 1), showing a lack 
of consistency in Laurasiatheria phylogeny under 
different kinds of characters used. The lack of 
consistency with such large amounts of data can be also 
evidenced from Hou et al (2009), in which 2705 
orthologous genes (~40 Mb) from Cetartiodactyla, 
Perissodactyla, and Carnivora were used, and different 
tree topologies were produced under different tree- 
building methods. When Chiroptera was added into the 
analysis, they found close relatedness of Perissodactyla 
with Carnivora, and that of Cetartiodactyla with 
Chiroptera (Figure 1P, Table 1). 

Hallström & Janke (2008) screened 3 012 genes 
(~280 kb) from four orders with available genome 
sequences (Eulipotyphla, Chiroptera, Carnivora, and 
Cetartiodactyla). Their results only recovered the basal 
placement of Eulipotyphla, and failed to resolve the 
relationships of the other three orders. The rapid 
radiation of Laurasiatheria within a narrow time scale 
was proposed to explain the irresolution using such large 
amounts of data. By utilizing the third condon position 
GC content (GC3) of 1 138 protein-coding orthologous 
genes (~300 kb), Romiguier et al (2010) revealed the 
Laurasiatheria phylogeny to be (Eulipotyphla, 
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(Cetartiodactyla, (Chiroptera, (Perissodactyla, Carnivora)))) 
(Figure 1M, Table 1). 

Hallstróm et al (2011) analysed 4 775 protein- 
coding genes (~6 Mb) screened from the available 
genome sequences of five Laurasiatheria orders and 
determined the tree as (Eulipotyphla, (Chiroptera, 
(Cetartiodactyla, (Perissodactyla, Carnivora)))) (Figure 
1B, Table 1). Their results were more consistent with 
morphological and mt genome analyses as well as 
nuclear analyses of Murphy et al (2001b). However, 
Pholidota was not included in the analyses. When the 
retroposon insertions from these genome sequences were 
analysed, their study supported the earliest divergence of 
Eulipotyphla, but failed to resolve the relationships 
among the other four orders. Additionally, McCormack 
et al (2011) analysed 683 loci of ultraconserved elements, 
supporting the tree topology (Eulipotyphla, (Cetartiodactyla, 
(Chiroptera, (Perissodactyla, Carnivora)))), which was 
consistent with Nishihara et al (2006) (Figure 1M, Table 
1). 

By screening protein-coding genes within genome 
sequence data and amplifying them in Laurasiatheria 
species, Zhou et al (2011a) analyzed 8 markers (~5 kb) 
and reconstructed the Laurasiatheria phylogeny as 
(Eulipotyphla, (Cetartiodactyla, (Chiroptera, (Perissodactyla, 
Carnivora)))) (Figure 1M, Table 1), consistent with the 
results from Nishihara et al (2006) and McCormack et al 
(2011). Subsequently, Zhou et al (2011b) investigated 
Laurasiatheria phylogeny based on 97 orthologous genes 
(~46 kb) from six orders. Regardless of datasets and 
analytic methods used, they obtained an identical tree 
topology of (Eulipotyphla, (Chiroptera, ((Carnivora, 
Pholidota), (Cetartiodactyla, Perissodactyla)))) (Figure 
1H, Table 1). This result was consistent with those from 
Murphy et al (2001a) and Waddell et al (2001). 

Recently Nery et al (2012) choose 3 733 
orthologous genes (~50 Mb) obtained from available 
genome sequences of five orders, namely Eulipotyphla, 
Chiroptera, Carnivora, Cetartiodactyla, and Perissodactyla. 
They obtained a Laurasiatheria phylogeny of (Eulipotyphla, 
((Chiroptera, Cetartiodactyla), (Perissodactyla, Carnivora))) 
with high supports (Figure 1P, Table 1). 

As the most comprehensive study so far, Dos Reis 
et al (2012) studied 11 available whole genomes of five 
Laurasiatheria orders (except Pholidota). Based on 857 
nuclear genes and 153 mt genomes (-2 Gb), they 
reconstructed the tree topology as (Eulipotyphla, 
(Cetartiodactyla, (Chiroptera, (Perissodactyla, (Carnivora, 
Pholidota)))) (Figure 1M, Table 1), which was 
consistent with Nishihara et al (2006) and McCormack et 
al (2011). 

Most recently, Song et al (2012) analyzed 447 
orthologous genes (~7 Mb) and used multispecies 
coalescent model analysis to investigate the phylogenetic 
relationships among the five orders of Laurasiatheria 
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(except Pholidota). Their results strongly favored 
(Eulipotyphla, (Chiroptera, (Cetartiodactyla, (Perissodactyla, 
Carnivora))) (Figure 1B), supporting Waddell et al 
(19992), Murphy et al (2001b), Arnason et al (2002), 
Arnason & Janke (2002), Amrine-Madsen et al (2003), 
Springer et al (2004), Wildman et al (2006), and 
Hallstróm et al (2011). 


Perspectives 


The phylogeny of  Laurasiatheria, which is 
characterized by rapid species radiations and short 
internal tree branches, has long been one of the most 
controversial and challenging problems in mammalian 
systematics. So far, only the earliest divergence of 
Eulipotyphla and the close relationship between 
Carnivora and Pholidota has been generally accepted. 
The main controversies are concentrated on the 
phylogenetic positions of the other three orders, i.e., 
Chiroptera, Cetartiodactyla and Perissodactyla. 
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Abstract: Gamma-aminobutyric acid (GABA) and glutamate are two important amino acid neurotransmitters widely present in the 
nervous systems of mammals, insects, round worm, and platyhelminths, while their receptors are quite diversified across different 
animal phyla. However, the evolutionary mechanisms between the two conserved neurotransmitters and their diversified receptors 
remain elusive, and antagonistic interactions between GABA and glutamate signal transduction systems, in particular, have begun to 
attract significant attention. In this review, we summarize the extant results on the origin and evolution of GABA and glutamate, as 
well as their receptors, and analyze possible evolutionary processes and phylogenetic relationships of various GABAs and glutamate 
receptors. We further discuss the evolutionary history of Excitatory/Neutral Amino Acid Transporter (EAAT), a transport protein, 
which plays an important role in the GABA-glutamate “yin and yang” balanced regulation. Finally, based on current advances, we 


propose several potential directions of future research. 
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Amino acid neurotransmitters play significant roles 
in animal nervous systems. Different amino acid 
neurotransmitters have different functions in nervous 
systems. Interestingly, in mammals GABA and glutamate 
coexist as a pair of antagonistic neurotransmitters with 
glutamate functioning as an activator and GABA as a 
repressor. So far, no comprehensive analysis has been 
conducted in relation to the origin and evolution of the 
“yin and yang” balanced relationship between GABA 
and glutamate. Here we present a scenario on the 
evolution of these two antagonistic acid neurotransmitters 
and their receptors through comprehensive analysis on 
literature and related data. 


GABA and glutamate as neurotransmitters exist in 
early animal lineages 

Gamma-aminobutyric acid is well known as the 
main inhibitory neurotransmitter in the brains of 
mammals (DeFeudis, 1975). Such a function appears 
ancestrally derived as evidence shows that GABA exerts 
similar inhibitory functions in the nervous system of 
even the most primitive animals. For example, in the 
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hydrozoan Hydra vulgaris, which has the most simple 
and archaic nervous system, GABA decreases the 
outputs of impulse generating systems, including 
ectodermal body contraction bursts (CBs), the number of 
pulses in a burst (P/CB), and endodermal rhythmic 
potentials (RPs) (Kass-Simon et al, 2003). Consistently, 
GABA has inhibitory effects in some Platyhelminthes 
such as the flatworm Gastrothylax crumenifer, which 
evolved the first central nervous system (Verma et al, 
2010). In the phylogenetically advanced nematodes, 
GABA becomes the main inhibitory neurotransmitter 
(McIntire et al, 1993). There are some exceptions for 
special species such as Cancer borealis, in which the 
stomach cells can produce an excitatory reaction after 
GABA binding (Swensen et al, 2000). In contrast to 
animals, GABA in plants plays a role in various 
physiological processes such as the tricarboxylic acid 
cycle, nitrogen metabolism, insect defense, oxidative 
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stress defense, osmoregulation, and signal transmission, 
which are extensively reviewed elsewhere (Bouché & 
Fromm, 2004). 

Glutamate is the most abundant excitatory 
neurotransmitter in the brain of vertebrates (Nakanishi, 
1992), and almost one third of central nervous system 
synapses contain glutamate (Fiorillo & Williams, 1998). 
Most glutamate in the brain comes from the glutamine 
and Krebs cycles. Like GABA, glutamate has been 
proposed to play roles in the nervous system of the most 
primitive animals. Antagonistic to GABA, glutamate 
increases the outputs of ectodermal and endodermal 
impulse generating systems in the phylogenetically basal 
hydrozoan Hydra vulgaris (Kass-Simon et al, 2003), and 
acts as an excitatory neurotransmitter in cestode and 
aplysiidae nervous systems (Davis & Stretton, 1995; 
Verma et al, 2010; Gerschenfeld, 1973; Kehoe & Marder, 
1976; Segerberg & Stretton, 1993). Glutamate is the 
main excitatory neurotransmitter in the nervous systems 
of nematodes and fruit fly (Devaud et al, 2008). However, 
glutamte could also induce an inhibitory function 
through some special receptors in nematode and fruit fly 
(Schuske et al, 2004). 

Both GABA and glutamate have existed in the 
earliest stage (coelenterate) of animal evolution, and they 
usually function antagonistically. As will be discussed 
below, the structural and functional differentiation 
between GABA and glutamate receptors during 
evolution has given rise to diverse receptors with more 
specialized functions. 


Evolution of GABA and glutamate receptors 

The GABA receptors, which recognize and bind 
GABA, are located in the postsynaptic membrane. 
During binding, ion permeability of the membrane 
changes (Lü & Tian, 1991). There are three types of 
known GABA receptors: GABA,, GABA and GABAc. 
GABA, is the most common inhibitory neurotransmitter 
receptor in the brain and is a kind of ligand-gated 
chloride channel. GABAg is a G protein-coupled 7- 
transmembrane receptor, which can activate second 
messenger system and K', Ca^ channels. It is a 
heterodimer and is composed of two homologous 
subunits, GABAg, and GABA (White et al, 1998). It is 
believed that GABAg and glutamate receptors have 
common ancestor and will be discussed below. The 
GABAc receptor, composed of GABAp subunits, is a 
newly discovered GABA,-like receptor, and is also a 
kind of ligand-gated chloride channel (Chebib & 
Johnston, 1999). Structurally, the p subunit is part of the 
family of GABA, receptor subunits (Shimada et al, 
1992), and thus GABAc is usually regarded as a special 
GABA, (Barnard et al, 1998) The phylogenetic 
relationship of the p subunit between GABA, receptor 
subunits is included in Figure 2. 
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Evolution of GABA, receptor genes 

The GABA, receptor belongs to the inhibitory 
ligand-gated ion channel family. This family also 
includes glycine receptor, another amino acid 
neurotransmitter suggested to be derived from the 
GABA, receptor family (Vassilatis et al, 1997). In 
invertebrates such as nematodes, ascarids, and fruit flies, 
GABA, receptor-like genes exist in pairs (Hosie et al, 
1997; Martínez-Delgado et al, 2010), encoding class I 
and II subtypes, which are homologues of a and f 
subunits of the vertebrate GABA 4 receptors, respectively. 
In contrast, GABAA receptor genes exist as clusters in 
vertebrates (Tsang et al, 2007), suggesting a gene family 
expansion in vertebrates. The cluster mode of GABA, 
receptors in vertebrates very likely originated after the 
divergence between vertebrates and invertebrates, and 
the common ancestor of the coupled GABA, genes may 
be the a-ß like genes of bilaterians (Tsang et al, 2007). 
There are at least 16 GABA; receptor subunits (a1—a6, 
B1-B3, y1—y3, 6, £, x and 0) in the central nervous system 
of vertebrates (Wilke et al, 1997). These subunits are 
classified into three types according to the similarity of 
the subunit amino acid: a1—a6; B1—-B3 and 0 (B-like); and 
yl-y3 and e (y-like). Generally, the GABA, receptor is 
composed of two a, two p, and one y subunits (Chang et 
al, 1996; Tretter et al, 1997). Human GABA; receptor 
genes are distributed in four gene clusters on four 
chromosomes (4, 5, 15, X) (Darlison et al, 2005). 
Darlison et al (2005) stated that different GABA, 
receptor subunits in vertebrates may have originated 
from gene duplications from the vertebrate ancestor 
based on the similarities among gene clusters (Figure 1). 
Duplication probably occurred during the two whole 
genome duplications (WGD) in the evolution of 
vertebrates. The positionally corresponding genes in 
these gene clusters have similar functions, which also 
supports the duplication scenario (Darlison et al, 2005). 
Within each gene cluster, some genes experienced 
further individual gene duplications (Figure 1). 
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Figure 1 Evolution model of human GABA, receptor. 
The figure is modified from Darlison et al (2005) 
Every gene cluster contained three kinds of subunits (y, a, and p). e is y-like 
and 0 is B-like. The two WGD events occurred about 500 and 430 million 
years ago, respectively. The direction of the arrows below the subunits 


shows the transcriptional direction. 
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In Drosophila, three kinds of ligand-gated ion 
channel subunit genes were cloned: resistance to dieldrin 
(RDL), GABA and glycine-like receptor of Drosophila 
(GRD) and ligand-gated chloride channel homologue 3 
(LCCH3), which shows a sequence similarity to the 
GABA, receptor subunits in vertebrates (Hosie et al, 
1997; Martínez-Delgado et al, 2010). Totally, Drosophila 
has 12 functionally differentiated GABA, receptor-like 
genes, which are scattered in different genomic regions 


and not present as gene clusters. The ascarids have 39 
GABA, receptor-like genes resulting from single gene 
duplication and subsequent subfunctionalization (Tsang 
et al, 2007). The phylogenetic tree in Figure 2 is the 
relationship between insect (Drosophila) GABA, 
receptor like genes and vertebrate ligand-gated receptor 
subunits, showing that GRD is close to the GABA, 
receptor a subunit, LCCH3 is close to GABA, B subunit, 
and RDL is close to vertebrate GlyR. 





85 GABAA alpha-1 
87 GABAA alpha-2 
100 GABAA alpha-3 
100 GABAA alpha-5 
GABAA alpha} | Vertebrate GABA 
97 100 GABAA alpha-6 receptor o/y subunit 
95 GABAA gamma-1 
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100 GABAA gamma-3 
55 GABAA gamma-4 
GRD Drosophila melanogaster 
RDL Drosophila melanogaster 
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Figure 2 Phylogenetic tree constructed by Neighbor-Joining via MEGA version 5.0 with Bootstrap Test (1 000 replications) 


demonstrating the similarities among the known insect (Drosophila) GABA receptor subunits and vertebrate 


ligand-gated anion-channels 


The phylogenetic tree is based on similarities in the amino acid sequences. Vertebrate GABA receptor subunits are marked as a, p, etc., GLY refers to GlyR 


subunits while Glu-Cl refers to glutamate-gated chloride-channels from Haemonchus contortus (Hosie, et al, 1997, Martinez-Delgado, et al, 2010). GABA 
alpha-1: P62813.1, GABA alpha-2: P26048.1, GABA alpha-3: P26049.1, GABA alpha-4: AAB46866.1, GABA alpha-5: P19969.1, GABA alpha-6: P16305.3, 
GABA beta-1: P18505.2, GABA beta-2: P47870.2, GABA beta-3: P63079.1, GABA gamma-1: NP_034382.2, GABA gamma-2: P18507.2, GABA gamma-3: 
P28473.1, GABA gamma-4: P34904.1, GABA rho-1: NP_002033.2, GABA rho-2: NP_002034.2, GABA rho-3: NP 001099050.1, glycine alpha-1 Homo 
sapiens: NP_001139512.1, glycine alpha-2 Rattus norvegicus: NP_036700.1, glycine, alpha-3 Homo sapiens: NP_006520.2, glycine beta Homo sapiens: 
NP_001159532.1, Glu-Cl Haemonchus contortus: P91730.1, GRD Drosophila melanogaster: Q24352.1, LCCH3 Drosophila melanogaster: AAS65370.1, RDL 


Drosophila melanogaster: AAF50311.1 


Other ion channels evolved from GABA, ligand- 
gated ion channel 

As mentioned above, the vertebrate ligand-gated ion 
channel, GlyCl receptor, evolved from the GABA, 
receptor family (Vassilatis et al, 1997). It is interesting 
to note that the invertebrate specific glutamate receptor 
like protein, GluCI, is also derived from the GABA, 
receptor family (Ortells & Lunt, 1995). In invertebrates, 
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no GlyR has been identified (Tsang et al, 2007), and 
invertebrate GluCI may be homologous to vertebrate 
GlyCl because they both have a special cystine ring 
structure (Vassilatis et al, 1997). Invertebrate-specific 
GluCT will be discussed in the context of glutamate 
receptors below. 

Interestingly, although GABA is a main inhibitory 
neurotransmitter in nematodes, both inhibitory GABA 
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receptor transporting Cl and excitatory GABA receptor 
transporting Na’ in C. elegans exist, and thus GABA 
plays an excitatory role under some circumstances 
(Schuske et al, 2004; Yates, et al, 2003). Beg & 
Jorgensen (2003) cloned the excitatory GABA receptor 
gene exp-/ from C. elegans. By sequence analysis they 
found that protein EXP-1 had the closest relationship 
with inhibitory GABA receptor. Therefore, it is unlikely 
that the GABA ligand-gated cation channels evolved 
from cation channels by mutations, but rather from 
mutations of the GABA ligand-gated anion channels 
(Beg & Jorgensen, 2003). Whether other species also 
have excitatory GABA receptors needs further 
investigation. 


Evolution of glutamate receptors 

There are two types of glutamate receptors: the 
ionotropic receptor (iGluR) and metabotropic receptor 
(mGluR) (Xie et al, 2003; Yano et al, 1998). When 
stimulated by glutamate, iGluR usually directly regulates 
cation influx, while mGluR promotes the second 
messenger G-protein, which leads to increased 
permeability of Na’ and K* on membranes and 
membrane potential depolarization. Ribeiro et al (2005) 
showed that S. mansoni has the homologs of mammal 
mGlu receptor-like genes and iGluRs (Ribeiro et al, 
2005). The mGluRs have been reported in more 
primitive metazoa, for example, mGluRs can be part of 
the sensory system in spongia (Miller, 1998). 


Evolution of iGluRs 

The iGluRs are tetramer type or pentamer type 
proteins, including N-methyl-D-aspartate (NMDA) type 
and non-NMDA type. The NMDA receptors mainly 
regulate the flow of Na’ and K^ by ligand-gated form, 
while non-NMDA receptors mainly use voltage-sensitive 
channels to control Na' influx depolarizing current. 
Interestingly, there is a kind of glutamate receptor in 
plants, whose sequence is highly similar with iGluR in 
animals (Chiu et al, 1999). The glutamate receptor in 
plants and animals shares the same domains, including 
two ligand binding domains and four transmembrane 
domains (MI-M4) (Chiu et al, 1999). The highest 
identity in these domains is 60%, discovered in the M3 
domain. These facts suggest that the core function of 
iGluRs could have existed before the division of animals 
and plants (Chiu et al, 1999). Chiu et al (1999) also 
found that the primitive iGluR receptors in plants do not 
function as ion channels, but depend on interaction with 
other proteins. Therefore, the real iGluR ion channel 
should appear after the division of animals and plants, 
especially after the division of NMDA and non-NMDA 
receptors (Ryan & Grant, 2009). 

Animal iGluRs are usually excitatory glutamate 
ligand-gated cation channels, which mainly exist on the 
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nerve joints (Cully et al, 1996). As mentioned above, 
however, some special inhibitory iGluRs (GluCI) are 
found in invertebrates, such as the nematodes and fruit 
fly (Schuske et al, 2004). It is believed that the GluCl 
receptors evolved from GABA ligand-gated chloride 
channels (Chiu et al, 1999). These receptors mainly exist 
in muscle and neuronal cell bodies, which can inhibit 
muscle excitation caused by glutamate ligand-gated 
cation receptors (Brockie et al, 2001). How the GluCl- 
inhibitory receptor and its regulatory system evolved and 
why they are absent in vertebrates remains unclear. 


Evolution of mGluRs, which are homologous to 
GABA receptors 

There are eight kinds of mGluRs, namely mGluR1- 
mGluR8 (Xie et al, 2003). Cao et al (2009) reported that 
GABAg receptors and mGlu receptors have the same 
ancestor based on sequence and structure similarity. They 
may have evolved from the periplasmic amino acid 
binding proteins (PBPs) of bacteria (Cao et al, 2009). 
The phylogenetic tree in Figure 3 shows the relationship 
among mGlu receptors (mGluR1-mGluR8), GABApg 
receptors (GABAs; and GABA yp), and their ancestor 
homolog leucine-binding protein (LBP), a kind of PBP. 
The GABAg receptor can be found in Drosophila 
(Mezler et al, 2001), nematodes (Bargmann, 1998), and 
hydra (Kass-Simon et al, 2003; Kass-Simon & 
Scappaticci, 2004), suggesting that GABAg receptors 
may have existed before these species diverged. In more 
primitive animal porifera there is a protein highly 
homologous to the GABAgRI receptor and mGluR5 
receptor in rats. After analyzing the function of this 
protein, Perovic et al (1999) found that the 
mGlu/GABAp-like proteins may even exist in porifera 
when the GABAg and mGlu receptor have not been fully 
differentiated (Perovic et al, 1999). 


GABA-glutamate balancing in vivo and evolution of 
glutamate transport proteins 

Several *yin and yang" regulation mechanisms 
between GABA and glutamate exist in the central 
nervous systems of mammals. The NMDA receptor, a 
subunit of glutamate receptors, which exists in 
parvalbumin-positive neurons, can monitor strong 
excitatory signals and then stimulate inhibitory neurons 
(Gordon, 2010). When the excitatory signal becomes 
weak, NMDA receptors cease monitoring the excitatory 
signal. Therefore, the NMDA receptor functions by 
balancing stimulatory and inhibitory signals, although 
the intrinsic mechanism is unknown (Gordon, 2010). 
Moreover, a regulatory network exists in which GABA, 
receptors and glutamate receptors interact with each 
other. For example, in the spin of hippocampal pyramidal 
neurons, NMDA receptor activation leads to 
phosphorylation of GABA, which causes the degradation 
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Figure 3 Phylogenetic tree constructed with Neighbor-Joining via MEGA version 5.0 with Bootstrap Test (1 000 replications) 


demonstrating that mGluR and GABAg receptors may have a common ancestor 


The phylogenetic tree is based on the similarities of amino acid sequences. The numbers after each gene are GenBank accession numbers. 


of GABAgs, finally reducing the inhibitory signal of 
GABA». On the other hand, at the parallel fibre-purkinje 
cell synapse, GABAg receptor activation enhances 
mGluRla signal transduction (Gassmann & Bettler, 
2012). Glutamate and GABA transport proteins also 
participate in the balanced regulation between the two 
neurotransmitters. There are two transport protein 
families in rodents, namely Excitatory/Neutral Amino 
Acid Transporter (EAAT) 1/2 and GABA transporter 
(GAT) 2/3, which transport glutamate and GABA, 
respectively. The EAAT 1/2 transfers glutamate from 
extracellular to intracellular together with three Na* and 
one H' when transporting one K^ in the opposite 
direction (Héja et al, 2009). Therefore, the co-existence 
of GAT with EAAT in glial cells can reverse GAT by 
high concentration Na’, and release GABA from glial 
cells to inhibit neurons. In summary, with the help of 
EAAT and GAT, excitatory glutamate can cause the 
release of GABA which displays inhibitory effects (Héja 
et al, 2009). However, the mechanism underlying the 
regulation of GABA on glutamate is still unclear. 
Gesemann et al (2010) analyzed sequence variation 
of the EAAT family in vertebrates. There are seven 
EAAT genes in humans and mice, and nine to thirteen 
genes in prototheria, sauropsida, and amphibia. These 
differences between species are mainly due to different 
genome duplication and gene loss history. Whole 
genome duplications occurred 500 million years ago 
(mya) in vertebrates, while an additional duplication 
occurred in teleosts about 350 to 320 mya. Thus, zebra 
fish have thirteen EAAT genes. Species-specific gene 
loss also determines kinds and number of EAATs in a 
certain species. For example, all theria lost EAAT6 and 
EAAT7, lepidosauromorpha retained EAAT7 but lost 
EAAT6, and archosauromorpha lost EAAT7 but retained 
EAATS. It is not clear whether glutamate-GABA balance 
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exists in invertebrates or not, although EAAT genes have 
been cloned from invertebrates (Gesemann et al, 2010). 


Perspective 

GABA and glutamate act as inhibitory and 
excitatory neurotransmitters, respectively, although they 
display opposite functions in some special species or in 
some special tissues, such as GABA in nematodes and 
glutamate in fruit fly and nematodes. We found that the 
exceptional cases mainly appeared in invertebrates. Their 
diverse functions rely on the refined evolution of their 
receptors, which turn out to be multiform and more 
elaborate. Unlike vertebrates in which there is a GABA- 
glutamate balance system mediated by GABA and 
glutamate transport proteins, in invertebrates the balance 
may be regulated by those specialized receptors. 
However, the regulatory mechanism and its evolutionary 
destination remains an open question. One particularly 
interesting example is the invertebrate specific GluCl 
receptor, which evolved from the GABA receptor and 
functions as an exceptionally inhibitory glutamate 
receptor. How this receptor evolved and why its function 
seems lost during the emergence of vertebrates is unclear. 
Future exploration addressing this question will shed 
light on the evolution of the GABA-glutamate balance 
system. Similarly, another interesting issue comes from 
the phylogenetic similarity between mGluRs and 
GABAg receptors. Some evidence shows that the 
GABAg receptor regulates mGluRs in the GABA- 
glutamate balance system, and that mGlu/GABA y-like 
proteins may have even existed in early organisms when 
the GABAs and mGlu receptor were not fully 
differentiated. The how and when the two ancient 
paralogous genes functionally diverged remain 
unanswered, as does the evolutionary significance of this 
functional divergence. 
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How glutamate and GABA quickly turn over 
excitatory and inhibitory effects in animals has long been 
an enigma. Recently, research on the balance between 
glutamate and GABA has focused on the EAAT and GAT 
transporter system in mammals. But there are still many 
uncertainties in this balance system. Firstly, while 
evidence indicates how glutamate causes the release of 
GABA, a few studies show the opposite process. 
Secondly, our review shows the evolution of EAAT, but 
little is known about the GAT. Furthermore, following 
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Involvement of XZFP36L1, an RNA-binding protein, 
in Xenopus neural development 
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Abstract: Xenopus ZFP36L1 (zinc finger protein 36, C3H type-like 1) belongs to the ZFP36 family of RNA-binding proteins, which 
contains two characteristic tandem CCCH-type zinc-finger domains. The ZFP36 proteins can bind AU-rich elements in 3' 
untranslated regions of target mRNAs and promote their turnover. However, the expression and role of ZFP36 genes during neural 
development in Xenopus embryos remains largely unknown. The present study showed that Xenopus ZFP36L1 was expressed at the 
dorsal part of the forebrain, forebrain-midbrain boundary, and midbrain-hindbrain boundary from late neurula stages to tadpole 
stages of embryonic development. Overexpression of XZFP36L1 in Xenopus embryos inhibited neural induction and differentiation, 
leading to severe neural tube defects. The function of XZP36L1 requires both its zinc finger and C terminal domains, which also 
affect its subcellular localization. These results suggest that XZFP36L1 is likely involved in neural development in Xenopus and 


might play an important role in post-transcriptional regulation. 


Keywords: ZFP36L1; RNA-binding protein; Neural development; Xenopus; Post-transcriptional regulation 


mRNA stability is an important mechanism for gene 
expression regulation, which is critical for embryogenesis 
and normal physiological processes. The zinc finger 
protein 36 (ZFP36) family proteins are key players in 
post-transcriptional regulation and contain two or more 
highly conserved tandem CCCH zinc fingers (TZF) of 
tandem Cx8Cx5Cx3H repeats (x represents a variable 
amino acid). These factors are able to bind to AU-rich 
elements (AREs) in the 3' untranslated regions (UTR) of 
targeted mRNAs through their TZFs (Carballo et al, 
2000; Lai & Blackshear, 2001; Lai et al, 1999; Lai et al, 
2000), which leads to the removal of the poly(A) tail off 
the targeted mRNAs and promotes their degradation 
(Carrick et al, 2004; Lai & Blackshear, 2001). The 
ZFP36 family proteins are evolutionarily conserved, 
including four rodent members (ZFP36, ZFP36LI, 
ZFP36L2, and ZFP36L3) and three mammal members 
(with ZFP36L3 missing, Blackshear et al, 2005), each of 
which has two conserved ZF domains. Four members 
have been reported in Xenopus and zebrafish, among 
which ZFP36, ZFP36L1, and ZFP36L2 are conserved 
while the fourth member (Xenopus ZFP36L2.1, ZFP36L2.2 
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and zebrafish ZFCTHI) contains four ZF domains 
instead of two. The four ZFP36 genes in Xenopus have 
been cloned and are widely expressed in adult tissues, 
except for XZFP36L2.1 (XC3H-4), which is detected 
only in the ovary (De et al, 1999). It has been suggested 
that XZFP36L2b (XC3H-3b) plays an important role in 
Xenopus kidney development (Kaneko et al, 2003), but 
little is known about the roles of other ZFP36 members 
in embryonic development. 

Also known as cMGI, TISI1b, ERF1, BRF1, C3H- 
2, and Berg36 (Barnard et al, 1993; De et al, 1999; Lai et 
al, 1990; Varnum et al, 1991), ZFP36L1 has been 
reported to regulate vascular endothelial growth factor 
(VEGF) mRNA stability in — adrenocorticotropic 
hormone-stimulated primary adrenocortical cells acting 
as a negative regulator of VEGF during development 
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(Ciais et al, 2004). Research has also shown ZFP36L1 to 
be regulated by Von Hippel-Lindau (VHL) tumor 
suppressor protein in renal cancer cells (Sinha et al, 
2009). In addition, ZFP36L1 is phosphorylated by p38 
MAPK/MAPK-activated protein kinase 2 (MK2) and 
PKB/AKT pathways. Phosphorylation of ZFP36L1 
inhibits its ARE mRNA decay activity, but does not 
affect its ability to bind AREs (Benjamin et al, 2006; 
Maitra et al, 2008; Schmidlin et al, 2004). Due to a 
failure of chorioallantoic fusion, ZFP36L1 knockout 
mice died in utero before embryonic day 11 (E11) 
(Stumpo et al, 2004). The XZFP36L1 gene contains 345 
amino acids and shows 76% identity to human BRF-1 
(De et al, 1999). However, the developmental expression 
and function of Xenopus ZFP36L1 remains largely 
unknown. In this study, we cloned XZFP36L1 and 
investigated its expression pattern and potential function 
during Xenopus laevis embryo development. 


MATERIALS AND METHODS 


Plasmid construction 
The XZFP36L1 full open reading frame (ORF) and 
all truncated fragments were amplified by polymerase 
chain reaction (PCR) using the XL073b24 clone as a 
template, which was provided to us by the National 
Institute for Basic Biology, Japan (NIBB) The PCR 
primers were: 
XZFP36L1 ORF: forward: 5'-ATGTCTACAGCTTTGAT 
TTCTCC-3' 
and reverse: 5'- ATCATCAGAGATAGAAAGTCTGC-3* 
N-term truncate: forward: 5'-ATGTCTACAGCTTTGAT 
TTCTCC-3' 
and reverse: 5'-CTGTCCCCCAGGCTTTTGGAGAA-3'; 
NZF truncate: forward: 5'-ATGTCTACAGCTTTGAT 
TTCTCC-3' 
and reverse: 5'- TCTCTCTTCTGCATTGTGGATG-3'; 
ZF domains: forward: 5'-CAGGTGAACTCCAGCAGG 
TACAAG-3' 
and reverse: 5'- TCTCTCTTCTGCATTGTGGATG-3* 
ZFC truncate: forward: 5'- CAGGTGAACTCCAGCAG 
GTACAAG -3' 
and reverse: 5'- ATCATCAGAGATAGAAAGTCTGC-3* 
C-term truncate: forward: 5- GAGAGACGCTTGGTAT 
CAGGAC-3' 
and reverse: 5'- ATCATCAGAGATAGAAAGTCTGC-3". 
For expression in Xenopus embryos and cultured 
cells, the fragments were cloned into a eukaryotic 
expression vector pCS2+ with a Flag tag coding 
sequence at the C-terminal. 


Phylogenetic analysis of the ZFP36 protein family 
Twenty full coding amino acid sequences of the 

ZFP36 family were used for Bayesian phylogenetic 

analysis, including ZFP36 H.sapi (AAHO09693), 
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ZFP36 M.musc 
(NP 001081884), ZFP36 D.rerio 
ZFP36 C.elegans (NP 505927) ZFP36L1_ H.sapi 
(NP 004917), ZFP36L1_ M.musc (NP 031590), 
ZFP36L1A_X.laev (NP 001089645), ZFP36LIB X./aev 
(NP 001084214) ZFP36L1A D.rerio (NP 001070621), 
ZFP36LIB D.rerio (NP 955943), ZFP36L2 H.sapi 
(NP 008818), ZFP36L2 M.musc (NP 001001806), 
ZFP36L2A X.laev (NP 001080610), ZFP36L2B X.laev 
(NP 001081886), | ZFP36L2 D.rerio | (NP 996938), 
ZFP36L2.1 X.laev (NP. 001081888), ZFP36L2.2 X.aev 
(NP 001108269), ZFP36L3 M.musc (NP 001009549), 
and ZFCTHI D.rerio (NP. 571014). 

Caenorhabditis elegans ZFP36 (C3H-1) were 
included as an out-group. Sequence alignment was 
performed using Clustal W (Thompson et al, 1994). The 
Bayesian approach was implemented in MrBayes3.12 
(Ronquist & Huelsenbeck, 2003) with estimation of the 
substitution rate model, and gamma distribution of 
among site rate variation two runs were used, each with a 
single chain of 2x10" generations, sampled every 10000 
generations. Convergence was assessed by comparing 
the standard deviation of split frequencies between runs. 
We excluded 1000 trees from a total sample of 2001 
trees in each run. Sample independence was confirmed 
as no significant increase in log-likelihoods observed 
after the burn-in phase. 


(AAH21391), ZFP36 X.laev 


(XP. 002665468), 








In situ hybridization, section and RT-PCR analysis 

Whole mount in situ hybridization (WMISH) on X. 
laevis was carried out as previously described (Gawantka 
et al, 1995). To examine potential overlapping expression, 
digoxigenin- and fluorescein-labeled RNA probes were 
used for double WMISH. Plasmid XL073b24 was cut 
using BamHI, and transcribed with T7 RNA polymerase 
to generate an antisense probe of XZFP36L1. The probes 
of X. laevis N-Tubulin (Papalopulu & Kintner, 1996), 
Sox2 (Ma et al, 2007), Slug (Aybar et al, 2003), Xath1 
(Kim et al, 1997), En2 (Hemmati-Brivanlou et al, 1991), 
Pax3 (Sato et al, 2005), Dbx2 (Ma et al, 2011), and 
Nkx6.2 (Zhao et al, 2007) were used as previously 
described. Stained embryos were embedded in paraffin, 
sectioned at 20 um, and photographed using a light 
microscope (Leica). 

Semi-quantitative reverse transcription PCR was 
carried out as previously described (Wu et al, 2010). The 
PCR primers and conditions were: 

XZFP36L1A: forward: 5'-TGCATGAGCAGCAGACGT 
G-3' and reverse: 5'-GGTCTTATGCGTCTCTCCATG-3’, 
30 cycles; 

XZFP36L1B: forward: 5''TGCATGAGCAGCAGAGGA 
A-3' and reverse: 5'-GCCCTTCTGTGTCTCTAAACA- 
3', 30 cycles; H4 was used as a loading control: forward: 
5'-CGGGA TAACATTCAGGGTA-3’ and reverse: 5'- 
TCCATGGCG GTAACTGTC-3’, 26 cycles. 
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Embryo culture and microinjection 

In vitro fertilization, embryo culture, and WMISH 
of Xenopus embryos were carried out as per Gawantka et 
al (1995). Embryos were staged according to Nieuwkoop 
and Faber (1967). Capped mRNA of X. laevis ZFP36L1 
and its truncated forms were synthesized in vitro using 
SP6 mMessage Machine kit (Ambion). Microinjection 
was carried out using the PLI-1 Pico-injector (Harvard 
Apparatus) equipped with an MK-1 micromanipulator 
(Singer Instruments). Embryos were injected in one 
dorsal blastomere at the 4-cell stage with 0.5—1 ng 
XZFP36L1 mRNA or 0.3-0.6 ng of truncated mRNA. 
Injected mRNA was traced by co-injected 100 pg nuclear 
f-galactosidase, which was stained with Red-Gal 
substrate (Research Organics). The detection of £- 
galactosidase activity for tracing cell lineage was carried 
out as previously described (Kurata & Ueno, 2003). 


Cell transfection and immunohistochemistry 

The XZFP36L1 ORF and its truncated constructs 
were cloned in frame into a eukaryotic expression vector 
pCS2+ with a C-terminal Flag tag coding sequence. Hela 
cell culture, transfection, and immunohistochemistry 
were carried out as described in previous research (Wu et 
al, 2010). 


RESULTS 


Cloning and phylogenetic analysis of ZFP36 family 
proteins 

Through BLAST search against EST databases, two 
alleles of X. laevis ZFP36L1 were found, XZFP36L1A 


and XZFP36L1B, which shared 97% identity at the 
amino acid level. The full ORF of XZFP36LIB was 
cloned by PCR using an EST clone as a template 
(XL073b24, from the National Institute for Basic 
Biology, Japan). In the following studies on XZFP36L, 
this allele was used unless otherwise stated. Consistent 
with its mammal orthologues, XZFP36L1 had two 
tandem ZF domains. To confirm the evolutionary 
relationships between the homologues of ZFP36L1, a 
phylogenetic tree was constructed using MrBayes. The 
nomenclature of the XZFP36 members was according to 
Xenbase (Bowes et al, 2008). The ZFP36 family proteins 
were grouped into four well-supported vertebrate clades 
inferred from the phylogenetic results (Figure 1). The 
results indicate that ZFP361, ZFP36L1, ZFP36L2, and 
ZFP36L3 were indeed paralogues that arose by gene 
duplication from a common ancestor to the vertebrate 
lineage. However, ZFP36L3 in rodents and the fourth 
ZFP36 genes in fish (ZFCTHI) and frog (ZFP36L2.1 
and 2.2) were lineage specific. Among the ZFP36 related 
proteins, members of the ZFP36L1 and ZFP36L2 
subfamilies were more closely related. 
Temporal and spatial expression of XZFP36L1 
during Xenopus laevis early development 

We examined XZFP36L] expression during X. 
laevis embryogenesis by RT-PCR and WMISH. Both 
XZFP36L1A and B showed similar temporal expression 
patterns, as detected by RT-PCR. They were both 
expressed at low levels maternally and in embryos from 
blastula to gastrula stages. Their expression went up at 
early neurula stages and remained stable at later stages 
(Figure 2A). 
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Figure 1 Bayesian tree depicting the phylogenetic relationships of the ZFP36 proteins. Branch confidence values are shown as 
Bayesian posterior probabilities. The scale bar shows the number of substitutions per site. 
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Figure 2 Temporal and spatial expression patterns of XZFP36L1 
A: RT-PCR analysis of the developmental expression of XZFP36L1A and B; B-I: Expression patterns of XZFP36L1 revealed by WMISH and paraffin sections; 


B: Stage 10, the embryo was bisected, the arrowheads indicate the expression of XZFP36L1 in the ectoderm and involuting mesoderm; C: Stage 16, anterior 


view; D: Stage 20, lateral view; E: Stage 20, anterior view; F-I: Stage 32 embryos: F: Lateral view. The broken lines indicate the position of corresponding 


sections showed in F’ and F”. Br, branchial arches. F’, F”: Transverse sections of stage 32 embryo, the arrowheads indicate the expression of XZFP36LI in the 


dorsal brain. ot, otic vesicle; G: Dorsal view of the head region; H: Horizontal section showing its expression in the fore-midbrain boundary (red arrowheads) 
and mid-hindbrain boundary (black arrowheads); I: Double WMISH of En2 (red) and XZFP36L1 (blue). Arrowheads in (C-T) indicate the expression of 
XZFP36L1 in the forebrain (blue), fore-midbrain boundary (red) and mid-hindbrain boundary (black). White arrowheads in (C) and (E) indicate its expression 


at the presumptive neural placode region. Bars in (B-1)-0.5 mm. 


Since the two alleles of XZFP36L1 shared more 
than 9095 identity, it was difficult to design probes to 
distinguish their expression patterns by in situ 
hybridization. Here we used XZFP36L1B as a template 
for in situ probe. At the gastrula stage, XZFP36L1 
transcripts were detected in both ectoderm and 
mesoderm (Figure 2B). During neurula stages, its 
expression became enriched in the neural system, 
especially in the anterior neural folds (Figure 2C), where 
their expression became gradually restricted into three 
discontinuous patches, corresponding to the presumptive 
forebrain, fore-midbrain boundary. and mid-hindbrain 
boundary, respectively (Figure 2D, E). Its expression was 
also detected in the placode regions at the anterior neural 
plate borders (Figure 2C, E). At tadpole stages, 
XZFP36L1 remained expressed in the three 
discontinuous regions of the brain and also in the 
branchial arches (Figure 2F, G). These domains were 
confirmed to be the forebrain, fore-midbrain boundary, 
and mid-hindbrain boundary, respectively, by horizontal 
sections (Figure 2H, arrow heads) and double in situ 
hybridization with En2, a mid-hindbrain boundary 
marker (Figure 2I). Transverse sections of stage32 
embryos showed that XZFP36L1 was mainly expressed 
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at the dorsal part of the neural tube (Figure 2F', F”). 
These data showed that XZFP36L1 was expressed in 
specific brain regions during neurulation and might play 
a role in neural development. 
Overexpression of XZFP36L1 leads to neural tube 
defects 

To investigate the function of XZFP36L1 during 
Xenopus embryogenesis, we overexpressed XZFP36L1 
in Xenopus embryos by injecting synthetic XZFP36L1 
mRNA into one dorsal cell at the 4-cell stage. 
Interestingly, the injected embryos showed severe neural 
tube defects (NTDs). Compared to the control group, the 
neural tubes of the injected embryos failed to close 
properly and remained open at tailbud stages (57%, n=81, 
Figure 3A, B). To further investigate the role of 
XZFP36L1 in neural development, we checked the 
expression of several neural markers in ZFP36L1 mRNA 
injected embryos. The results showed that many of the 
neural markers were inhibited at neurula and tailbud 
stages, including N-tubulin (67%, n=24, Figure 3C), 
Sox2 (86%, n=35, Figure 3D), Slug (85%, n=20, Figure 
3E), En2 (84%, n=25, Figure 3F), Nkx6.2 (73%, n=15, 
Figure 3G), Pax3 (70%, n=17, Figure 3H), Dbx1 (69%, 
n=16, Figure 31), and Xath1 (6296, n=13, Figure 3J). 
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Figure 3 Overexpression of XZFP36L1 leads to NTDs in Xenopus embryos 
A: Control embryos, stage25; B: Embryo injected with XZFP36L1 mRNA shows severe defects during neural tube closure (arrowheads), stage 25; C—J: 
Overexpression of XZFP36L1 inhibits neural differentiation. Expression of neural marker genes N-tubulin (C), Sox2 (D), Slug (E), En2 (F), Nkx6.2 (G), Pax3 
(H), Dbx1 (1) and Xath1 (G) is largely inhibited in the injected sides (arrowheads). (C—F), stage 15; (G-J), stage 25. Bars=0.5 mm. 


These results suggested that overexpression of 
XZFP36L1 inhibited neural induction and differentiation 
in Xenopus embryos, leading to severe neural tube 
defects. 


The ZF and C-terminal domains of XZFP36L1 are 
required for its activity and subcellular localization 

The XZFP36L1 protein contains an N-terminal 
domain, two ZF domains, and a C-terminal domain 
(Figure 4A). To determine which domains are essential 
for its function in neural development, several truncated 
forms of XZFP36L1 were constructed (Figure 4A), 
transcribed into mRNAs, and injected into Xenopus 
embryos. Embryos were examined at stage 20, when the 
neural tube was completely closed. The results showed 
that truncates containing both the ZF and the C-terminal 
domains led to defects in neural tube closure for Xenopus 
embryos like the full length ORF (Figure 4F), whereas 
other truncated forms lacking either the ZF motif or the 
C-terminal domain did not exhibit such an effect (Figure 
4 D, E, G). These results suggest that both the ZF and the 
C-terminal domains were necessary for its function. 

The XZFP36L1 gene belongs to the ZFP36 RNA- 
binding protein family. The ZF domains of ZFP36 family 
proteins are suggested to bind to the AREs of targeted 3' 
UTRs. Thus, these proteins are supposed to locate in the 
cytoplasm to execute their functions. To determine 
whether the activity of XZFP36L1 in neural development 
was related to its subcellular localization, we fused the 
full-length XZFP36L1 ORF and its truncates with a C- 
terminal Flag tag coding sequence and performed 
immunohistochemistry using anti-Flag antibodies in 
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transfected Hela cells. As expected, the full length ORF 
was detected dominantly in the cytoplasm, which was 
consistent with human ZFP36L1 (CMG 1) results (Figure 
4H; Phillips et al, 2002). The construct containing only 
the N-terminal domain showed granular distribution in 
the cytoplasm (Figure 4I). All the other truncated forms 
were more enriched in the nucleus, except for the one 
containing both the ZF and C-terminal domains, which 
showed higher signal in the cytoplasm compared to other 
truncates (Figure 4J-M). According to the above data, the 
activity of these truncates in NTDs seemed consistent 
with their localization in the cytoplasm, indicating that 
the cytoplasmic localization of ZFP36L1 was important 
for its function. 


DISCUSSION 


In this study, we showed that XZFP36L1 was 
expressed in specific brain regions in Xenopus embryos, 
and overexpression of this protein inhibited neural 
induction and differentiation and led to severe NTDs, 
indicating that XZFP36L1 was involved in Xenopus 
neural development. We also tried to knockdown the 
expression of endogenous ZFP36L1 using specific 
morpholinos. Unfortunately, no clear phenotypes were 
observed in the injected embryos, although the 
morpholinos actively inhibited the expression of the 
reporter GFP genes carrying the targeted sequences (data 
not shown). The reason for this might be that either the 
morpholino was not efficient enough to block 
XZFP36L1 expression, or its function was compensated 
by other members of this gene family, since it has been 
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Figure 4 Activity and cellular localization of different XZFP36L1 truncates 
A: The schematic structures of XZFP36L1 protein and its truncates; B-G: Both the ZF motif and the C-terminal domain are required for its activity in neural 
development; B: control embryos. Overexpression of full length XZFP36L1 (C) or the ZFC truncate (F) leads to defects in neural tube closure (arrowheads), 


while overexpression of the NZF truncate (D) or ZF domains (E) or C-terminal truncate (G) has no obvious effect on neural tube closure; H-M: Localization of 


XZFP36L and its truncates in transfected Hela cells. 


reported that the ZFP36 protein family are all RNA- 
binding proteins and may share some common target 
mRNAs, e.g., TNF- a (Carballo et al, 1998; Lai et al, 
2000), GM-CSF(Carballo et al, 2000; Lai & Blackshear, 
2001), and VEGF(Bell et al, 2006; Brennan et al, 2009; 
Ciais et al, 2004). 

The function of XZP36L1 required both its ZF and 
C terminal domains, the ZF domains for binding to AREs 
and targeting mRNA degradation and the C-terminal 
domain to contain nucleus export sequences responsible 
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Abstract: Though Yunnan province contains some 562 known species of fish, no cell lines from any of these have been made 
available to date. To protect germplasm resources and provide an effective tool in solving problems at cellular level of Anabarilius 
grahami, a fish endemic to Fuxian Lake, Yunnan, China, we established and characterized the major features of a continuous cell line 
(AGF II) from the caudal fin tissue of A. grahami. This AGF II cell line consists of fibroblast-like cells and has been subcultured 
more than 60 times over the course of a year. The cell line was maintained in DMEM/F12 supplemented with 10% FBS, with a 
cellular doubling time of 51.1 h. We continued with more experiments to optimize the culture and storage conditions, and found a 
variety of interesting results: cells could grow at temperature between 24 ?C and 28 ?C, with the optimal temperature of 28 ?C. 
Likewise, the growth rate of A. grahami fin cells increased when the FBS proportion increased from 5% to 20%, with the optimal 
growth at the concentrations of 20% FBS; cells were able to grow in L-15 and DMEM/F12 with optimal growth at L-15; DMSO is a 
better cryoprotectant than Glycerol, EG and MeOH for AGFII cells with optimal concentration of 5% DMSO. Chromosome analysis 
also showed that the distribution of chromosome number varies from 38 to 52, with a modal peak at 48 chromosomes, accounting for 
39.8% of all cells. Using the same primer pairs specific to mtDNA, the AGF II cell sequences obtained by PCR were identical to 
those from muscle tissues of A. grahami. Both chromosome analysis and PCR amplification confirmed the AGF II cells were from A. 
grahami, also indicating that that current long-term artificial propagation of A. grahami has been successful. Finally, we noted that 
when cells were transfected with pEYFP-N1 and pECFP-N1 plasmid, bright fluorescent signals were observed, suggesting that this 
cell line may be suitable for use in transfection and future gene expression studies. 


Keywords: Characterization; Fibroblast-like cell line; Anabarilius grahami; Cryopreservation 


In vitro cultures of animal cells have provided a 
powerful tool in studying different a variety of subjects, 
including immunology (Bols et al, 2001), environmental 
toxicology (Bahich & Borenfreund, 1991), endocrinology 
(Bols & Lee, 1991), virology (Wolf, 1988), biotechnology 
and aquaculture (Bols & Lee, 1991), and resources 
protection (Zhou et al, 2008; Chen & Qin, 2011). Since 
the first reported fish cell line in 1962 (Wolf & Quimby), 
over 280 lines have been established, most of derived 
from freshwater and anadromous fish species (Wolf & 
Mann, 1980; Fryer & Lannan, 1994; Lakra et al, 2011). 
To date, more than 50 cell lines have been established in 
China including over 20 species (Chen & Qin, 2011). 
Among these fish, most are economically important 
species, such as Anguilla japonica (Ku et al, 2010), 
Chrysophrys major (Chen et al, 2003), Lateolabrax 
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japonicus (Ye et al, 2006) and Epinephelus akaara 
(Huang et al, 2009); nevertheless, only a few are rare fish, 
such as Acipenser sinensisi (Zhou et al, 2008) and 
Gobiocypris rarus (Tan et al, 2009). In Yunnan province 
alone, home to china’s largest biodiversity hotspot, there 
are 562 known species of fish (Yang & Li, 2010), and yet 
no cell lines are available. Clearly, more attention should 
be paid to the rare and endemic fish cell cultures in 
future. 
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One such native species endemic to Fuxian Lake, 
Yunnan, Anabarilius grahami (Regan) belongs to 
Cypriniformes, Cyprinidae and has been assessed as a 
vulnerable species on the China species red lists (Wang 
& Xie, 2004) and evaluated as a threatened fish due its 
declining numbers (Liu et al, 2009). Fuxian Lake, the 
second deepest lake in China (Ley et al, 1963; Yang, 
1984), has a great wealth of endemic fish, some4.7% of 
all Yunnan fish species (Yang & Chen, 1995). However, 
because of exotic species invasion and over-exploitation, 
the production of A. grahami has sharply declined in 
recent years (Yang & Chen, 1995; Li et al, 2003a; Xiong 
et al, 2006). To date, a few reports on its Karyotype (Zan 
& Song, 1980), ecology (Yang, 1994; Li et al, 2003a), 
diet composition (Qin et al, 2007), artificial propagation 
(Li et al, 2003b,c), disease control (Li et al, 2003d), 
embryonic development (Ma et al, 2008) and genetic 
diversity (Yang et al, 2008; Liu et al, 2011) have been 
made available. Unfortunately, little research has been 
carried out on A. graham's cell culture of. To further 
protect this valuable species and aid in its recovery, Li et 
al (2003b) succeeded in launching artificial cultivation 
and the release of some fingerlings into Fuxian Lake 
annually since 2007. Liu et al (2009) also noted that an 
investigation of the restocked population is necessary. 
Aquaculture by artificial cultivation has inherent 
limitations, e.g. germplasm degradation, reduction of 
genetic diversity, and a weakened ability to defend 
against disease and environmental stress, (Chen, 2002; 
Su et al, 2012). A cell culture of A. grahami could 
provide an effective tool in studying these problems on 
cellular level. 


MATERIALS AND METHODS 


Primary cell culture 

A healthy juvenile of A. grahami (10 cm total length) 
was obtained from the tenth generation of artificial 
cultivation from the Endangered Fish Conservation 
Center (EFCC), Kunming Institute of Zoology, Chinese 
Academy of Sciences. The specimen was disinfected 
with potassium permanganate. Caudal fin tissue was 
clipped and transferred to flasks using aseptic techniques 
and then washed three times with Hanks’ balanced salt 
solution (HBSS) supplemented with antibiotics 
(penicillin, 400 IU/ml; streptomycin, 400 ug/ml; 
fungizone, 10 ug/ml). The washed tissue was minced into 
about 1 mm? in 10 cm diameter Petri dishes, and 
transferred into T-25 flasks (Corning) containing 3 mL of 
growth medium (DMEM/F12, Gibco) supplemented with 
20% fetal bovine serum (FBS, Gibco), 100 IU/ml of 
penicillin, and 100 ug/ml of streptomycin. All flasks 
were incubated at 28 'C. After 3 days, 3 mL of growth 
medium was added into the uncontaminated flasks. 
Monolayers of primary cells were obtained after 30-40 
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days (Clem et al, 1961; Chen & Qin, 2011; Freshney, 
2010). 


Sub-culture and maintenance 

Each flask was examined to ensure it was free of 
microbial contamination and normal morphology of the 
cells. The spent medium was removed to a new flask and 
then combined with the same volume of fresh medium, 
making a conditioned medium. The cells were washed 
with HBSS and covered with 1 mL of 0.25% trypsin- 
EDTA solution (Invitrogen). Each culture was examined 
under an inverted microscope (Olympus, CKX41) to 
ensure that cells had separated from the flask surface. 
The trypsin solution was decanted and the cells were 
dislodged by gently tapping the flask. A volume of 10 
mL fresh or conditioned medium was added to the flask 
and shaken to obtain a cell suspension. Finally, 5 mL of 
the suspension was transferred to a new flask and 
incubated at 28 'C. The cells were confluent within 3-4 
days (Zhou et al, 2008; Chen & Qin, 2011). 


Growth of cells 

Cells were placed at an initial density of 4.8x10*/ml 
into a 1-25 flask with DMEM/F12+10% FBS at 28 C. 
Cells from the flasks were collected and counted daily 
for 9 days using a haemocytometer,; this process was 
repeated three times (Freshney, 2010). 


Effect of medium on cell growth 

The effect of medium on cell growth was evaluated 
in 4-well. Cells was respectively cultured in DMEM/F 12, 
DMEM, M199 and L-15 with 10% FBS at 28 'C. Cells 
of duplicate wells in each medium were trypsinized and 
counted with a haemocytometer daily for 5 days (Huang 
et al, 2009; Ou et al, 2010). 


Effect of temperature and FBS concentrations on cell 
growth 

Cells were seeded into 4-well containing L-15 
medium with 10% FBS and incubated at 20 °C, 24 C, 
28 'C and 32 'C, successively. Duplicate wells were used 
for each temperature. In the same manner, L-15 medium 
containing 5%, 10%, 15%, 20% and 25% FBS were 
assessed for their effect on cell growth at 28 'C. In the 
three tests, the average number of cells was calculated 
daily for 4 days using a haemocytometer (Ku et al, 2010; 
Lakra et al, 2011). 


Effect of cryoprotectant on cell cryopreservation 

The procedure of cell cryopreservation as follows: 
cells were propagated for 3-4 days in T-75 flasks until 
confluent monolayers had formed and cell suspension 
from each flask was collected into a 50 mL centrifuge 
tube and then centrifuged at 200 r/min for 8 min. The 
pellets were suspended in FBS containing optimal 
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cryoprotectant to obtain a concentration of 5-6x10^ 
cells/ml. Cell suspensions were then transferred to 2-mL 
cryovials and kept sequentially at -20 'C for 4 h, -80 'C 
for 16 h and finally stored in liquid nitrogen -196 °C 
(Freshney, 2010; Chen & Qin, 2011). The frozen cells 
were recovered in a water bath at 37 °C, transferred to L- 
15 medium (10% FBS) in a T-25 flask and incubated at 
28 'C. The medium was decanted from the flask 24 h 
after incubation and replaced with 5 mL of fresh L-15 
medium. The flask was incubated at 28 'C until the cells 
reached confluence. 

We progressively examined the effects of 
cryoprotectants, DMSO, Glycerol, ethylene glycol (EG) 
and Methanol (MeOH) on cell cryopreservation., 
allowing us to then examine suitable concentrations of 


better cryoprotectant at 5%, 7.5%, 10%, 12.5%, and 15%. 


The quality of the cells recovered after thawing was 
assessed according to two criteria: (1) viability of cells, 
assessed by Trypan blue exclusion assay; and (2) 
adhesion ability of the thawed cells, estimated by 
counting the cells and then seeding them in 4 wells with 
L-15+10% FBS, and dissociated with 0.25% Trypsin- 
EDTA after 24 h at 28 °C. The cell adhesion percentage 
was the ratio of the total numbers of viable cells 
recovered after adhesion to the number of viable cells 
after thawing (Mauger, 2006). 


Cell origin identification 

Total DNA was isolated from fresh A. grahami 
muscle tissue and the AGF II cell line using the DNeasy 
Tissue Kit (BioTeke). Primers PDL1: 5-ACCCCTGGC 
TCCCAAAGC-3' and PDH1: 5'-ATCTTAGCATCTTCA 
GTG-3' were used to amplify the 925 bp of D-loop gene 
using PCR (Liu et al, 2002). The PCR conditions were 
modified as follows: 50 uL mixture of PCR containing 5 
uL of 10x buffer, 1 uL of 10 uM PDLI and PDHI 
primers, 3 uL of 10 mM dNTP, 0.25 uL Supernew Taq, 1 
uL DNA and 36.75 uL ddH;O. PCR was ran at 94 °C 
denaturation for 3 min followed by 35 cycles of 94 'C for 
45 s, 52 °C for 45 s and 72 'C for 90 s, and a final 
extension of 10 min at 72 °C. The purified fragments 
were then sequenced by Sangon Biotech (Shanghai) 
(Yang et al, 2008; Liu et al, 2011). 


Chromosome preparation 

Chromosome spreads were prepared from AGF II 
cells at passages 32. About 10° cells were seeded into a 
T-75 flask and grown in L-15 medium supplemented 
with 10% FBS. After 1 day incubation at 28 'C, 0.2 
ug/mL of colcemid (Sigma) was added to the medium for 
2 h to arrest the cells in metaphase. The majority of cells 
(70%) were dislodged from the flask by gentle tapping 
and then harvested by centrifugation (200 g, 8 min). Cell 
pellets were suspended in a hypotonic solution consisting 
of 0.4% KCI for 15 min, and thereafter fixed in a 3:1 
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mixture of methanol:acetic acid. Chromosomes spread 
on a glass slide were then prepared according to the 
conventional drop-splash technique, followed by staining 
with 5% Giemsa for 8 min (Zan & Song, 1980; Freshney, 
2010) and then the number of chromosomes was counted 
under a Leica DMR microscope (Leica Mikroskopie und 
Systeme GmbH). 


Cell transfection 

Cells were seeded in 24-well plates and transfected 
with the plasmid DNA of two fluorescent proteins 
(pEYFP-N1 and pECFP-N1) by Lipofectamine 2000 
(Invitrogen) at a ratio of 1:2. The medium was renewed 6 
h after transfection, and cells were observed under a 
Leica inverted fluorescence microscope 24 h after 
transfection. The transfection efficiency was determined 
by counting the number of fluorescent protein-positive 
cells as well as the total cell number in 20 independent 
optical fields at a magnification of x400, after 60h post- 
transfection (Ouyang et al, 2010). 


RESULTS 


Primary cell culture and maintenance 

After 5-6 days, cells migrated out from the edge of 
the caudal fin tissues and grew well, forming a 
monolayer within 40 days at 28 °C. The initial cells 
consisted of both epithelioid-like and fibroblast-like cells 
(Figure 1A). After 6 subcultures, the fibroblast-like cells, 
named AGF II (AG, Anabarilius grahami; F, Fin), 
became predominant (Figure 1B). To date of this study, 
the cell line has been subcultured more than 60 passages. 


Growth of cells 

After subculture, AGF II cells progress through a 
characteristic growth pattern: a lag phase (1-3 day), 
exponential phase (3-7 day) and stationary phase (7-9 
day) (Figure 2). The AGF II cells’ population doubling 
time during exponential growth was determined to be 
51.1 h, at a seeding density of 4.8x10* cells/mL with 
DMEM/F12+10% FBS at 28 °C. 


Effect of medium on cell growth 

The effects of different medium on cell growth are 
presented in Figure 3. The cells adhered the first day 
with L-15, DMEM/F12 and DMEM were more than with 
M1640.Afterward, cells were able to grow in L-15 and 
DMEM /F12, with optimal growth at L-15.No obvious 
growth was observed at DMEM and M1640. 


Effect of temperature and FBS concentration on cell 
growth 

Our observations showed that AGF II Cells grew at 
20 C, 24 'C and 32 'C with optimal growth at 28 'C 
(Figure 4A). The cells grew well between 24 'C and 


Volume 33 Issues E5-6 


E92 WANG, et al. 





Figure 1 Photomicrograph of the Anabarilius grahami cell line 
A: Epithelioid-like (e) and fibroblast —like (f) cells from fin explants were 
present after 40 days culture at 28 'C; B: Monolayer of AGFII cells at 
passage 13. Bar-100 um. 


No. of cells x 10*'/mL 





Days 


Figure 2 Growth curve of AGF II cells with 
DMEM/F12+100% FBS at 28 °C 


28 'C, but when the culture temperature declined to 
20 'C, no obvious growth was observed, and the cells 
staid at the lag phase. When the temperature was raised 
to 32 'C, aside from no growth being observed, the 
number of viable cells declined. 

Growth of AGF II cells at varying levels of FBS 
concentration is shown in Figure 4B. Our results show 
that serum was required for the cell to survive and grow, 
and that when concentration of FBS varied between 5% 
and 20%, the higher the concentration of FBS, there was 
a faster cell growth rate, but there was no faster growth 
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Figure 3 Effect of four different media on AGF II cell growth 
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Figure 4 Effect of (A) temperature and (B) FBS concentration 
on AGF II cell growth 


rate observed when the concentration of FBS was 
increased to 25%. 


Effect of cryoprotectant on cell cryopreservation 

A given cryoprotectant and concentration of 
cryoprotectant can affect the quality of thawed cells. Our 
data showed that the viability and ability of cells to 
recover after cryopreservation were significant lower 
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than that of the fresh control(Figure 5A). DMSO 
treatment gave better re-viability percentage (80.36+ 
5.3)% and re-adhesion percentage (44.24+7.5)% than 
others (P«0.01). The re-viability percentage between 
Glycerol (61.29+4.5)% and EG (60.29+5.1)% treatment 
was not significant(P>0.05), nor was the re-adhesion 
percentage. MeOH treatment gained worst re-viability 
and  re-adhesion percentages, (28.344.5)% and 
(2.42+1.0)%, respectively. For both the percentages of 
viability and adhesion of cryopreserved cells, we noted a 
discernible relationships of DMSO>Glycerol and 
EG>MeOH. We likewise found that DMSO is the best 
cryoprotectant for AGFII cells. 
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Figure 5 Effect of different cryoprotectant (A) and concentration 
of DMSO (B) on viability and adhesion percentage of 
recovered AGF II cells at passage 21 after 
cryopreservation 


In addition to the above-mentioned results, all 
graded concentrations of DMSO treatment led to more 
cell losses than those observed with the fresh control 
(P<0.05). Among the 5 treatments, 5% proved to be the 
optimal concentration, whose re-viability and re- 
adhesion percentage were respectively (94.03+5.7)% and 
(51.2+8.0)%. As concentrations of DMSO increase, the 
percentage of viability and adhesion decreased. More 
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than 70% of the thawed cells were viable after 
cryopreservation for a month, and more than 40% could 
be recovered after seeding with the 4 other treatments 
(Figure 5B). 


Cell origin identification 

PCR products of the D-loop gene showed a band 
near | 000 bp by agarose gel electrophoresis (Figure 6). 
Sequencing of this gene obtained a 925 bp length, and 
then comparative analysis showed these sequences from 
fresh A. grahami muscle tissue and from AGF II at 
passage 3, 10 and 20 were identical These data 
confirmed that the origin of the established AGF II cell 
lines were from A. grahami. 


1000 


750 





M P3 P10 


P20 Marker 


Figure 6 Photograph of agarose gel electrophoresis from 
PCR products 
Total DNA from line M was from fresh muscle of A. grahami, and line P3, 
P10 and P20 were representing the AGF II cell line at passages 3,10 and 20, 


respectively. 


Chromosome analysis 

Chromosome morphology of AGF II is presented in 
Figure 7. The result of chromosome counts of 108 
metaphase plates revealed that the chromosome number 
of AGF II cells were widely distributed between 38 and 
52, with a modal peak at 48 chromosomes, and 39.8% of 
all cells contained 48 chromosomes. Aneuploidy was 
observed in the AGF II cell line, though they were small 
in proportion, such as 14.8%, 12.1% and 10.2% of all 
cells contained 50, 49, 47 chromosomes, respectively. 
Diploid karyotype formula of A. grahami fin cell line was 
14 m (metacentric chromosome)+20 sm (sub-metacentric 
chromosome)+14 st (subtelocentric chromosome). 


Cell transfection 

After electroporation, we observed yellow and cyan 
fluorescent proteins in the cell line after 24 h transfection 
(Figure 8). The transfection efficiency was 35% for 
pEYFP-N1 and 10% for pECFP-N1, indicating the 
suitability of this cell line for transfection and gene 
expression studies. 
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Figure 7 Chromosome distribution (A) metaphase (B) and 
diploid karyotype (C) of A. grahami fin cells at 
passage 32. 108 metaphases were counted 


DISCUSSION 


The AGF II is the first cell line of Anabarilius 
grahami and is also the first cell line developed from 
Fuxian Lake (Yang, 1994). The morphology of initial 
cells is the same as other fin cell lines (Zhou et al, 2008; 
Ku et al, 2010). Different cell lines require various 
culture and cryopreservation conditions (Chen & Qin, 
2011;Lakra et al, 2011), suggesting that it is necessary to 
optimize conditions of each cell line and significant to 
identify cell origin to ensure the AGF II comes from A. 
grahami, as well as cell application. 
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Figure 8 Fluorescent micrographs of yellow fluorescent 
protein (YFP, A) and cyan fluorescent protein 
(CFP, B) expression in transfected AGFII cells using 
pEYFP-N1 and pECFP-N1, respectively (x400) 


Culture conditions 

Though different media are used to culture different 
cell lines, DMEM/F12 comes close to being an all 
purpose culture medium for the cells of mammals, birds, 
reptiles, amphibians as well as fish (Wolf & Quimby, 
1969), and L-15 does not require CO; buffering, so they 
have also successfully used in fish cell lines (Leibovitz 
1963). 

The optimal incubation temperature of AGF II cells 
(24 'C-28 °C) is higher than optimal growth temperature 
of adult fish (23 'C-26 °C) and fertilized eggs (20 °C) of 
A. grahami (Li et al, 2003a; Ma et al, 2008). Long-term 
aquarium environment may contribute to these differences 
to a certain extent, because the temperature of A. 
grahami could adapt to 28 'C-29 °C in artificial breeding. 

FBS seems to be the most popular choice of 
supplements in the tissue culture media as it is easy to 
obtain in large volumes as well as for to the presence of 
both known and unknown growth factors (Lakra et al, 
2011). In fact, when using a FBS free medium, the AGF 
II cells did not adhere to the substratum flask, suggesting 
that FBS may play a key role in promoting cell 
attachment and proliferation, which is consistent with the 
case of the Acipenser sinensis fin cell lines (Zhou et al, 
2008). However, FBS concentrations are not usually 
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much higher than 20%, as there is evidence that high 
serum concentrations may inhibit cell growth (Freshney, 
2010; Chen & Qin, 2011). 


Cell cryopreservation 

Freezing lower vertebrate cells is done using the 
same procedure as for homeotherm cells, with optimal 
storage in liquid nitrogen (Wolf & Mann, 1980). 
However, less attention has been paid to the effects of 
different cryoprotectant and concentration on cell 
cryopreservation. DMSO is the most common 
cryoprotectant for cultured cells (Mauger et al, 2006; 
Freshney, 2010), yet Glycerol, EG and MeOH are mostly 
used in cryopreservation of germ cells but not somatic 
cells (Cabrita et al, 2010). All cryoprotectants have 
proved to have some negative effects toward fish cells, 
since the percentage of viability and adhesion of thawed 
cells is lower than fresh cells, which may be relative to 
the mild toxicity of cryoprotectant toward fresh cells, 
apart from the effect of lower temperature (Borenfreund, 
1989; Muchlisin, 2005; Mauger et al, 2006; Cabrita et al, 
2010). DMSO has been demonstrated to be better than 


other cryoprotectant for A. grahami cell cryopreservation. 


Similarly, concentration of DMSO did not influence cell 
cryopreservation, because all different DMSO 
concentration could obtain relative higher viability and 
adhesion than Glycerol, EG and MeOH. 


Cell identification 

Identifying the cell origin after establishment of the 
cell line has been finished is essential. Chromosome 
analysis is the most commonly used method (Freshney, 
2010), and because the mitochondrial DNA control 
region (D-loop) is highly variable and can be used for 
phylogenetic analysis (Yang et al, 2008); accordingly 
they were used to verify whether the AGF II was actually 
derived from A. grahami. Both the PCR amplification 
and chromosome analysis confirmed that the AGF II 
cells were from A. grahami. The D-loop is an important 
non-coding region of mitochondrial DNA that possesses 
a high rate of evolutionary change (Yang et al, 2008). All 
sequences of D-loop gene from 4. grahami brought 
about a discovery of difference between this A. grahami 
and the wider A. grahami species was not obvious. That 
relatively high genetic diversity of from the EFCC may 
contribute in this matter (Yang et al, 2008). 

A modal chromosome number (21-48) of AGF II 
cell line by Karyotype analysis was consistent with Zan 
& Song's (1980) results. The percentages of cells 
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containing modal chromosomes to total cells varied from 
20% to 70%, depending on different species and cell 
lines (Chen & Qin, 2011). The percentage in AGF II is 
39.8%, which is imperfect, but because the materials’ 
origin was not explicitly described in most studies—1.e. 
the use of a wild or domestic population and of which 
generation—so comparing our findings with their results 
is not significant or clearly useful. Conversely, AGF II is 
from the tenth generation of A. grahami through artificial 
propagation, suggesting that to some degree, long-term 
artificial propagation of A. grahami has been successful 
and may further benefit by frequently adding wild A. 
grahami species from different locations (Yang et al, 
2008). Both aneuploidy and  heteroploidy were 
widespread in fish cell lines (Wolf & Quimby, 1969; 
Kang et al, 2003; Ye et al, 2006; Cheng et al, 2010), 
which may be connected with the growth status of cell 
lines (Wolf & Quimby, 1969). 


Cell application 

As other cell lines, AGF II 1s also capable of 
accepting exogenous genes (Zhou et al, 2008; Ou et al, 
2010; Cheng et al, 2010), so we also assessed the 
application of the A. grahami cell line for exogenous 
gene manipulation by examining the ability of AGF II 
cells to express the YFP and CFP gene. Cases where 
different cell lines lead to different transfection 
efficiency (Zhou et al, 2008; Ku et al, 2010) and the 
ability of the same cell to express different gene is also 
distinct (Hameed et al, 2006) may potentially be the 
reason that transfection efficiency of pECFP-N lis lower 
than that of pEYFP-NI. 

In summary, we establish a freshwater fish cell line, 
AGF II, from the caudal fin of A. grahami, and 
subcultured more than 60 passages. This cell line could 
provide an important tool for the further study of 
exogenous gene manipulation among freshwater fish. We 
also expect that the results of this study will attract more 
researchers to pay attention to cell culture of native 
fishes, and protect fish germplasm resource of Yunnan 
province. 
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Stage-specific appearance of cytoplasmic microtubules around the 
surviving nuclei during the third prezygotic division of Paramecium 
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Abstract: There are six micronuclear divisions during conjugation of Paramecium caudatum: three prezygotic and three postzygotic 
divisions. Four haploid nuclei are formed during the first two meiotic prezygotic divisions. Usually only one meiotic product is 
located in the paroral cone (PC) region at the completion of meiosis, which survives and divides mitotically to complete the third 
prezygotic division to yield a stationary and a migratory pronucleus. The remaining three located outside of the PC degenerate. The 
migratory pronuclei are then exchanged between two conjugants and fuse with the stationary pronuclei to form synkarya, which 
undergo three successive divisions (postzygotic divisions). However, little is known about the surviving mechanism of the PC nuclei. 
In the current study, stage-specific appearance of cytoplasmic microtubules (cMTs) was indicated during the third prezygotic division 
by immunofluorescence labeling with anti-alpha tubulin antibodies surrounding the surviving nuclei, including the PC nuclei and the 
two types of prospective pronuclei. This suggested that cMTs were involved in the formation of a physical barrier, whose function 
may relate to sequestering and protecting the surviving nuclei from the major cytoplasm, where degeneration of extra-meiotic 


products occurs, another important nuclear event during the third prezygotic division. 


Keywords: Paramecium; Conjugation; Meiosis; Paroral cone region; Cytoplasmic microtubules 


Ciliates are a group of unicellular eukaryotes with 
nuclear dualism, possessing both polygenomic somatic 
macronuclei and diploid germinal micronuclei derived 
from synkaryon (fertilized nucleus) division products 
through conjugation, one kind of sexual reproduction 
(Orias et al, 2011). Paramecium caudatum is a globally 
distributed ciliate with one micronucleus and one 
macronucleus, and six micronuclear divisions during 
conjugation: three prezygotic and three postzygotic. The 
first two meiotic prezygotic divisions form four haploid 
nuclei,among which one is located and survives in the 
paroral cone region (PC, the area around the degenerated 
oral apparatus) of each conjugant (each cell of a 
conjugating pair), with the remaining three degenerating 
outside the PC. The surviving PC nucleus undergoes a 
third prezygotic division yielding two gametic nuclei: a 
stationary pronucleus (StP) and a migratory pronucleus 
(MiP), corresponding to a female and a male gamete of 
multicellular organisms, respectively. The MiPs then 
exchange reciprocally between two conjugants and fuse 
with the StPs to form synkarya, which divide three times 
successively (postzygotic divisions) to form eight synkaryon 
products (Calkins & Cull, 1907; Wichterman, 1986). 
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In our previous studies, immunofluorescence labeling 
and protargol staining techniques indicated stage-specific 
spindle microtubular behavior at the telophase of the 
third prezygotic division (Gao et al, 2011a, 2011b). 
Concerning cytoplasmic microtubules (cMTs), it has 
been suggested they play a role in pronuclear exchange 
(Nakajima et al, 2001). In the current study, 
immunofluorescence labeling with monoclonal antibodies 
of anti-alpha tubulin indicated stage-specific behavior of 
cytoplasmic microtubules (cMTs) during the third 
prezygotic division in P. caudatum, which might be 
related with the surviving mechanism of meiotic 
products and two gametic pronuclei. 


MATERIAL AND METHODS 


Chemicals 
Immunofluorescence staining-related chemicals 
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including monoclonal antibodies of mouse anti-a tubulin, 
FITC-conjugated goat anti-mouse Ig G, propidium iodide 
(PI), Triton-X 100, 0.5 mol/L EGTA (pH 8.0), 4% 
paraformaldehyde, and RNase A were purchased from 
the Beyotime Institute of Biotechnology (Haimen 
Jiangsu, China). Other chemicals were provided by the 
Hangzhou Dafang Chemical Reagent Inc (China). 


Cell culture and induction of conjugation 

Two complementary mating types of P. caudatum 
were collected from East Lake Campus of Zhejiang A & 
F University (China). Cell culture and conjugation 
induction followed previous studies (Hiwatashi, 1968). 
Conjugating pairs were isolated by iron-dextran particles 
(Sun et al, 2010; Yang & Takahashi, 1999). 


Immunofluorescence staining 

Immunofluorescence staining with monoclonal 
antibodies of anti-a tubulin followed previous studies 
(Gao et al, 2011b; Yang & Takahashi, 2002). 1) Cells 
were fixed in 2% paraformaldehyde diluted in 2 mmol/L 
phosphate buffer (pH 7.0) containing 25 mmol/L KCl 
(PBS). 2) Fixed cells were rinsed three times with 
washing buffer (PBS containing 5 mmol/L MgSOA, 2 
mmol/L EGTA, and 0.05% Triton-X 100). 3) Cells were 
blocked with 1% BSA dissolved in 5 mmol/L NHACI. 4). 
Cells were incubated overnight in 1 000x diluted 
monoclonal antibodies of anti-a tubulin containing 10 
ug/mL RNase A. 5) After three rinses, cells were 
incubated for 2 h in 500x diluted FITC-conjugated goat 
anti-mouse Ig G containing 2.5 ug/mL PI. 6) After a 
brief rinse, the stained cells were observed under a 
fluorescence Olympus DX60 microscope. All experiments 
were performed at room temperature (25 °C). 


RESULTS 


It is well-known that microtubules are involved in 
pronuclear transfer in both Paramecium (Jurand, 1976; 
Nakajima et al, 2001) and Tetrahymena (Orias et al, 
1983). In 2001, cytoplasmic microtubules (CMTs) and 
intranuclear microtubules (nMTs) were reported to play 
important roles in reciprocal nuclear exchange in P. 
caudatum (Nakajima et al, 2001). To determine if any 
cytoplasmic microtubules played any roles in the 
surviving mechanism of  post-meiotic nuclei, 
immunofluorescence labeling with anti-alpha tubulin 
antibodies was performed on the prezygotic conjugating 
pairs. 


No definite orientation of spindle extension during the 
first prezygotic division 

At the telophase of the first prezygotic division, 
anti-a tubulin antibodies recognized long and slender 
spindles and two meiotic products, but no macronuclei 
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were detected (Figure 1A). The PI staining recognized 
two meiotic products and the macronuclei (Figure 1A’). 
Neither definite orientation of spindle extension nor 
definite localization of the first meiotic products was 
observed. As a result, the two meiotic products were 
randomly distributed in the cytoplasm (Figure 1A"), as 
reported previously (Gao et al, 201 1b). 


One telophase nucleus located in PC during the 
second prezygotic division 

During the second prezygotic division, the anti-a 
tubulin antibodies also recognized long and slender 
spindles and four telophase division products. There was 
no definite orientation of spindle extension, but one or 
more telophase nuclei were already located in the PC 
(Figure 1B, B’, B"), as observed previously (Gao et al, 
2011b, 2011c). However, during the observation of ten 
conjugants in the earlier telophase, the spindles showed a 
tendency of extending towards the PC area directly 
(Figure 1C, D, E). 


Microtubular behavior soon after meiosis 

At the completion of meiosis, one meiotic nucleus 
was already located in the PC and nMTs were observed 
in all four meiotic products showing the same 
immunostaining pattern regardless of their locations 
inside or outside the PC (compare yellow and white 
arrows in Figure 2A, A', A"). With the third prezygotic 
division, more anti-a tubulin antibodies recognized tiny 
dots, the cytoplasmic microtubules (cMTs) appeared and 
accumulated around the PC areas of two conjugants 
(Figure 2B, B', B"), and at the metaphase such cMTs 
formed upside down heart shapes (Figure 2C, C', C"). 
However, these cMTs never appeared around the meiotic 
nuclei outside the PC and even the nMTs faded away 
from them (compare yellow and white arrows in Figure 
2C, C"). More than 50 conjugants were observed in each 
case and 100% of cells showed the same characteristics. 


Microtubular behavior around the telophase of the 
third prezygotic division 

At the telophase of the third prezygotic division, 
many tiny cMTs dots still existed and were mainly 
observed around both the prospective StP and MiP (blue 
and red arrows in Figure 3A, A" and B, B", respectively). 
At the stage of pronuclear exchange, however, almost no 
cMTs were observed around the two pronuclei, but 
nMTs still existed (Figure 3C, C"). More than 50 
conjugants were observed in each case and 100% of cells 
showed the same characteristics. 


DISCUSSION 


Previous studies on immunofluorescence labeling 
with anti-o tubulin antibodies have indicated a 
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Figure 1 Nuclear behavior during the first and the second prezygotic divisions of P. caudatum indicated by FITC and 
PI doubled-staining 
A: Telophase of the first prezygotic division; B: Telophase of the second prezygotic division; C, D, E: Earlier telophase of the second prezygotic division, one 
meiotic product entered the PC region. A and B are FITC images; A’ and B' are PI images; A", B", C, D and E are merged images of FITC and PI; A, A’, A" 


and B, B' , B" are the same cell in each row; Triangles: macronuclei; Bars-20 um. 
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Figure 2 Appearance of cytoplasmic microtubules (cMTs) soon after meiosis of P. caudatum indicated by FITC and 
PI doubled-staining 
A, B: Soon after meiosis; C: At the beginning of the third pre-zygotic division. A, B and C are FITC images; A', B' and C' are PI images; A", B" and C" are 
merged images of FITC and PI; A, A’, A" and B, B’, B" and C, C', C" are the same cell in each row; Yellow arrows: meiotic products in the PC; White arrows: 


meiotic products outside the PC; Triangle: old macronuclei; Oval area: the area of cMT appearance; Bars-20 um. 
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Figure 3 Distribution of cMTs before and after the completion of the third prezygotic division of P. caudatum indicated by 
FITC and PI doubled-staining 
A, B: Late telophase of the third prezygotic division; C: Reciprocal migratory pronuclear exchange. A, B and C are FITC images; A', B' and C' are PI images; 
A", B" and C" are merged images of FITC and PI; A, A’, A" and B, B', B" and C, C', C" are the same cell in each row; Yellow arrows: (prospective) migratory 


pronuclei; Blue arrows: (prospective) stationary pronuclei; White arrows: meiotic products outside PC; Triangle: old macronuclei; Bars-20 um. 


Zoological Research www.zoores.ac.cn 


Stage-specific appearance of cytoplasmic microtubules around the surviving nuclei during the third prezygotic division of Paramecium E103 


microtubular function in at least two aspects. The first 
involves a role during reciprocal pronuclear exchange in 
P. caudatum (Nakajima et al, 2001), which is supported 
by different experimental techniques in Tetrahymena 
(Orias et al, 1983) and other Paramecium species (Jurand, 
1976; Yang & Shi, 2007). The second is guiding the 
nuclei to the destined locations including the PC entrance 
at the completion of meiosis and anterior and posterior 
localization at the telophase of the third postzygotic 
division (Gao et al, 2011a, 2011b, 2011c; Yang & 
Takahashi, 2002). 

In the current study, stage- and space-specific cMTs 
were observed by immunofluorescence labeling with 
anti-alpha tubulin antibodies. They appeared during the 
third prezygotic division and were distributed around the 
surviving nuclei including meiotic products in the PC 
(Figure 2) and two prospective StPs and MiPs (Figure 3). 
However, such cMTs appeared neither around the extra- 
degenerating meiotic products (Figure 2C", 3A", B") nor 
the nuclear products of the first prezygotic division 
(Figure 1C", 3C") and the three postzygotic divisions 
(Yang & Takahashi, 2002). These results might indicate 
cMT involvement in the surviving mechanism of the 
post-meiotic nuclei. 

In fact, intranuclear microtubules (nMTs) exist in 
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Abstract: The genus Trachypithecus is the most diverse langur taxon, distributed in southwestern China, south and southeastern Asia. 


In this study, we include 16 recognized Trachypithecus species to reconstruct the phylogeny with particular concern to the taxonomy 


of the three subspecies of T. phayrei using multiple genes. Our results support a sister-relationship between T. p. phayrei and T. p. 


shanicus. However, the mitochondrial CYT B gene supported T. p. crepuscula as a distinct species, but the nuclear PRMI gene 


suggested a closer relationship between T. p. crepuscula and T. p. phayrei. The incongruence between nuclear and mitochondrial 


genes suggests that hybridization may have occurred, a fact that would benefit from re-examination using multiple unlinked nuclear 


genes. 


Keywords: Non-invasive sampling; Partitioned Bayesian phylogenetic analyses; Relaxed molecular clock; Trachypithecus phayrei 


The genus Trachypithecus is the most diverse langur 
taxon, having a broad distribution including India, Sri 
Lanka, Bangladesh, Southwestern China, and Southeast 
Asia (Groves, 2005; Wang et al, 1999). It is 
phylogenetically embedded within the Family 
Cercopithecidae and closely related to Semnopithecus 
(Perelman et al, 2011; Wang et al, 2012). Groves (2005) 
assigned full species status to 17 taxa, which he clustered 
into 5 species groups. While 16 of these species have 
been assessed in other phylogenetic contexts (Bleisch et 
al, 2008; Geissmann et al, 2004; Karanth et al, 2008; 
Liedigk et al, 2009; Nadler et al, 2003; Wang et al, 2012; 
Zhang & Shi, 1993), which has greatly improved our 
understanding of Trachypithecus evolution, there has 
been no dedicated molecular analysis of the complex 
species structure of Trachypithecus. Of particular 
concern is the disputed taxonomy of the endangered 
Phayre’s leaf monkey (T. phayrei). Three putative 
subspecies inhabit Bangladesh, northeastern India, 
Myanmar, Southwestern China, Thailand, Laos, and 
northern Vietnam (Bleisch et al, 2008). Recent genetic 
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analyses have demonstrated that T. p. crepuscula and T. p. 
phayrei do not form a monophyletic clade. T. p. phayrei 
from India is the sister taxon of T. barbei and T. obscures, 
but T. p. crepuscula from Vietnam represents a distinct 
lineage, being a close relative of the T. francoisi species 
group (Karanth et al, 2008; Nadler et al, 2003). 
Accordingly, Nadler et al (2003) and Liedigk et al (2009) 
suggested that T. p. crepuscula from Vietnam should be 
given full species status. However, a lack of genetic 
information from T. p. crepuscula and T. p. shanicus 
(southwestern China) has prevented agreement on the 
purported taxonomy (Bleisch et al, 2008). 

In this study, we sampled T. p. crepuscula and T. p. 
shanicus from southwestern China (Yunnan) and 
northern Myanmar and sequenced both mitochondrial 
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and nuclear genes to resolve the phylogenetic 
relationships of T. phayrei. The sequences of T. p. 
shanicus are represented for the first time. By including 
langur sequences from GenBank into our analysis, we 
test the validity of the species groups proposed by 
Groves (2005). Furthermore, we estimated divergence 
times simultaneously with phylogenetic reconstruction to 
test the diversification pattern of Trachypithecus through 
time (Drummond et al, 2012). This effort will provide 
further evidence to clarify the nebulous classification of 
T. phayrei and the putative Trachypithecus species-group 
divisions. 


MATERIALS AND METHODS 


Sampling and lab work 
Four samples of T. p. crepuscula and five of T. p. 
shanicus were collected from northern Myanmar, Yunnan, 


and Guangxi, China. One sample of T. germaini was 
obtained from the Kunming Cell Bank of the Chinese 
Academy of Sciences (Table 1). These samples are 
tissues from deceased individuals, museum skins, or 
noninvasive feces collections. All animal samples were 
obtained following the regulations of China for the 
implementation of the protection of terrestrial wild 
animals (State Council Decree [1992] No. 13) and 
approved by the Ethics Committee of Kunming Institute 
of Zoology, Chinese Academy of Sciences, China. 

Total DNA was extracted from tissues and museum 
skins using the phenol/proteinase K/sodium dodecyl 
sulphate method (Sambrook & Russell, 2001). Four fecal 
samples were collected from the Gaoligong Mountain 
National Nature Reserve in 2008. Samples were stored 
using the “two-step” storage procedure (Nsubuga et al, 
2004). We extracted total DNA from fecal samples using 
the 2CTAB/PCI method (Vallet et al, 2008). 


Table 1 Sample and sequences used in this study 








Species Collection code Code Sample Locality CYTB PRMI LZM 

à ae ipi la Myanmar01 NM Northern Myanmar KC285863 KC285882 KC285875 

YN_WLS0301001 WL Mt. Wuliang, Yunnan,China KC285866 KC285883 KC285876 

YN XSBNI BNI Xishuangbanna, Yunnan,China KC285864 KC285884 KC285873 

GX KCB89 BN2 Xishuangbanna, Yunnan,China KC285865 KC285885 KC285874 

T. p. shanicus YN GLG0912002 Gl Mt.Gaoligong, Yunnan,China KC285867 KC285886 KC285877 

YN GLG08002 G2 Mt.Gaoligong, Yunnan,China KC285868 KC285887 

YN GLGO08003 G3 Mt.Gaoligong, Yunnan,China KC285869 = KC285878 

YN_GLG08004 G4 Mt.Gaoligong, Yunnan,China KC285870 = KC285879 

YN_GLG08005 G5 Mt.Gaoligong, Yunnan,China KC285871 = KC285880 

T. germaini KCB92007 V2 Northern Vietnam KC285872 KC285888 KC285881 
We sequenced the mitochondrial CYT B and nuclear Inc., USA) and aligned with MUSCLE (Edgar, 2004). 


protamine P1 (PRMI) and lysozyme (LZM) genes in this 
study. CYT B has been widely used in mammalian 
phylogenetic and phylogeographic studies (Bradley & 
Baker, 2001). This gene was amplified using L14724_hk6 
(CCGTGATATGAAAAACCATCGTTG) and H15915 
(Irwin et al, 1991). L14724_hk6 was modified from the 
universal primer L14724 (Irwin et al, 1991) to avoid 
amplification of nuclear pseudogenes of CYT B (Karanth, 
2008). PRMI and LZM have been used previously in the 
phylogenetic study of other primates, including langurs 
(Karanth et al, 2008). These two genes were amplified 
following (Karanth et al, 2008). All PCR products were 
purified using the UNIQ-10 spin column DNA gel 
extraction kit (Sangon, Shanghai, China). Purified products 
were directly sequenced with PCR primers using the 
BigDye Terminator Cycle kit v3.1 on an ABI 3730xl 
sequencer by Tiangen Biotech Co, LTD. (Beijing, China). 


Phylogenetic and molecular dating analyses 
Nucleotide sequences were edited using SeqMan 
and EditSeq in the DNASTAR package v7.1 (DNASTAR, 
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CYT B and coding regions of the two nuclear genes were 
translated into amino acids following the identification of 
any premature stop codon. All these sequences were 
submitted to GenBank (Accession numbers: KC285863 
- KCKC285888). 1141 bp of CYT B, 387 bp of PRMI, 
and 135 bp of LZM were used in further analyses. 
Additional sequences of Catarrhini were downloaded 
from GenBank and added to the alignments 
(Supplementary Table 1, available online). In total, 70 
CYT B, 35 PRMI and 32 LZM sequences representing 36 
species including all three subspecies of Trachypithecus 
phayrei were used in this study. 

We performed partitioned Bayesian analyses 
(Brandley et al, 2005) to reconstruct the phylogenetic 
relationships using BEAST v1.7.2 (Drummond et al, 
2012). The analyses were performed on both the CYT B 
data set as well as the three-gene combined data set. 
Missing data were coded as “?”. The model of DNA 
evolution was determined by a Bayesian Information 
Criterion (BIC) (Schwarz, 1978) in jModelTest v0.1.1 
(Guindon & Gascuel, 2003; Posada, 2008). Because 
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there are missing data in the CYT B alignment, this gene 
was divided into two partitions (partition 1: 1-423 bp, 
826-1 141 bp; partition 2: 424-825 bp). Because 
partition 2 contains no missing data, it was used for the 
estimation of the time of molecular divergence (see 
below). The evolutionary model for each codon position 
of partition 1 and partition 2 of CYT B, LZM, and the 
coding and non-coding regions of PRMI/ were 


CYTB 










determined separately. We did not calculate the 
evolutionary model for each codon position of LZM or 
the coding regions of PRMI to avoid error caused by 
overpartitioning (Brown & Lemmon, 2007) In 
jModelTest, three substitution schemes were selected and 
a proportion of invariant sites were not included in the 
model selection, following He et al (2012). The partition 
information and models selected are shown in Figure 1. 
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Figure 1 Models of molecular evolution used for each gene/partition for phylogenetic reconstruction 


*: partition was used for divergence time estimation. 


We performed a MCMC search of twenty million 
generations, sampling every 2 000 generations. We 
constrained the monophyly of Semnopithecus + 
Trachypithecus according to previous analyses 
(Osterholz et al, 2008; Perelman et al, 2011; Wang et al, 
2012). All analyses were repeated twice. Tracer v1.5 was 
used to make sure all analyses reached the same posterior 
distributions and estimated the convergences by 
calculating effective sample sizes (ESSs) (Rambaut & 
Drummond, 2009). Posterior probabilities (PP) > 0.95 
were considered as strongly supported (Huelsenbeck & 
Rannala, 2004). 

Molecular divergence times were estimated 
simultaneously with the multi-gene phylogenetic 
reconstruction. Because missing data will mislead 
estimates of branch lengths and thus affect the result of 
divergence time estimation (Lemmon et al, 2009), our 
molecular dating analyses were limited to partition 2 of 
the CYT B alignment (402 bp, Figure 1). We used a 
“relaxed” molecular approach (Drummond et al, 2006) 
and used fossil records and secondary calibration for 
dating. Six calibration points were applied as lognormal 
or exponential distributions (Ho, 2007). Saadanius 
hijazensis dated to 29-28 Ma for the hominoid- 
cercopithecoid divergence between 29-28 and 24 Ma 
(Zalmout et al, 2010), so that we set the prior as 
lognormal and offset-24, SD=0.98. The oldest known 
hominoid (Morotopithecus bishopi) is dated to 20.6 Ma 
(Gebo et al, 1997). We set the prior as lognormal. The 
earliest sample age was set to 20.6 Ma (offset-20.6) and 
the older 95% CI to the beginning of the Miocene (24.1 
Ma, SD=0.76). The Homo-Pan divergence occurred 
between 7.2—5.6 Ma (Aiello & Collard, 2001; Senut et al, 
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2001), so we set the prior as lognormal and offset=5.6, 
SD =0.29. The oldest Colobinae was dated to 9.8 Ma 
(Benefit & Pickford, 1986; Nakatsukasa et al, 2010), but 
the molecular divergence estimation indicated a much 
more ancient divergence (Chatterjee et al, 2009), so we 
set the prior as exponential. The earliest possible age was 
set to 9.8 Ma, and the older 95% CI to 20 Ma (mean-3.4). 
The oldest African Colobinae was dated to 6.1 Ma 
(Gilbert et al, 2010). We set the prior as lognormal, the 
earliest possible age to 6.1 Ma, and the 95% CI to the 
beginning of the Late Miocene (11.6 Ma, SD=1.04). The 
most recent common ancestor (MRCA) of 
Semnopithecus and Trachypithecus was estimated to 
have existed at about 4.05 Ma (95% CI-2.93-5.36) 
(Perelman et al, 2011), so we set the prior as lognormal 
(mean-4.03, SD=0.18). Fossils of Trachypithecus have 
been found in Southeast Asia and southern China, most 
of which date to the Pleistocene (Jablonski, 2002). The 
oldest known fossil is T. auratus sangiranensis with an 
age of 1.90.5 Ma (Jablonski & Tyler, 1999); however, 
the site stratigraphy remains in doubt (Larick et al, 2000). 
Other fossils are not suitable for calibration, because of 
taxonomic uncertainty. 

The minimum-spanning tree of PRM/ were derived 
from Network v4.5 using the median-joining approach 
(Bandelt et al, 1999). 





RESULTS 


Phylogenetic relationships 

The combined phylogenetic tree is shown in Figure 
2, and the mitochondrial phylogenetic topology is 
identical to that based on multiple genes. Trachypithecus 
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Figure 2 Phylogenetic relationships of Cartarrhini based on combined mitochondrial and nuclear genes and chronogram of 


Cercopithecidae based on partial CYT B gene 


Node numbers above branches indicate mean divergence time; numbers below branches indicate Bayesian posterior probabilities; node bars indicate the 95% CI 
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taxa fell into two clades. T. pileatus, T. geei from India, T. 
vetulus, and T. johnii cluster with Semnopithecus (PP=1.0), 
which is consistent with previous studies (Chatterjee et al, 
2009; Karanth et al, 2008; Osterholz et al, 2008; Wang et 
al, 2012), suggesting that T. vetulus and T. johnii should 
be merged into the genus Semnopithecus. The rest of the 
Trachypithecus taxa form a monophyletic clade (PP=1.0) 
with five strongly supported lineages recognized 
(PP=1.0). Our results strongly support that T. p. phayrei 
and T. p. shanicus are sister-taxa (PP=1.0) and are close 
relatives to T. barbei and T. obscures (PP=1.0). T. p. 
crepuscula from southwestern China and Vietnam form a 
distinct lineage (PP=1.0) as the sister taxon to the species 
group T. Obscurus +T. cristatus (PP=1.0). T. geei and T. 
pileatus from Bhutan are strongly supported as the sister 
taxa of T. shortridgei (PP—1.0). 

LZM shows low variation in Trachypithecus and 
Semnopithecus. Only two mutations that lead to amino 
acid substitutions were observed (Table 2). As shown 
previously (Karanth et al, 2008), our results indicate that 
the T. pileatus and T. geei from India, which are 
members of the genus Semnopithecus in the phylogenetic 
tree (Figure 2), have the same LZM amino acid 
sequences as the species in the Trachypithecus clade, 
indicating possible hybridization. 

Nucleotide substitutions and insertions/deletions 
(indels) were found in the PRMI alignment in both 
coding and noncoding regions. In the Network tree, the 7: 


© Semnopithecus clade 
O Trachypithecus clade 


C» 1 amino acid deletion 
— 1 amino acid substitution 
— 1 mutation 







geei from India and T. francoisi share the central 
haplotype, which differs from other haplotypes by two to 
nine mutations (Figure 3). These mutations also lead to 
amino acid substitutions/deletions. T. p. shanicus shares 
the same haplotype with 7: germaini and T. obscurus and 
differs from the haplotype shared by T. p. phayrei and T. 
p. crepuscula by one amino acid deletion. 


Table 2 The first ten amino acid sequences of the lysozyme 
protein of presbytini species 





T. cristatus 


Nasalis larvatus M K A L I I L G L V 
Pygathrix nemaeus M A L I I L G L V 
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T. geei (India) M R A L I I L G L V 
T. francoisi M R A L I I L G L V 
T. phayrei phayrei M R A L I I L G L V 
T. phayrei shanicus M R A L I I L G L V 
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Figure 3 The median-joining network of the genus Trachypithecus based on partial PRMI 


Numbers by the branches indicate the number of mutations. 


Divergence time estimation 

The estimates of divergence time including 95% 
credible intervals are shown in Figure 2. The 
Trachypithecus clade diverged from outgroups at about 
5.25 Ma (7.01—3.77 Ma). Diversifications within this 
clade started at about 2.35 Ma (3.43-1.46 Ma) and 
occurred throughout the Pleistocene. T. p. phayrei and T. 
p. shanicus diverged at about 0.37 Ma (0.17—0.63 Ma). 
The MRCA of T. p. crepuscula occurred at about 0.41 
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Ma (0.16—-0.77 Ma). 
DISCUSSION 


As has been discussed previously, the conflicting 
phylogenetic information observed in Trachypithecus 1s 
likely related to hybridization, incomplete lineage sorting, 
and ancestral polymorphism (Karanth et al, 2008; 
Liedigk et al, 2009; Wang et al, 2012). Pleistocene 
glaciations may have caused large scale deforestation 
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throughout southeast Asia, isolating formerly widespread 
species in refugia, and the subsequent recolonization of 
these territories during interglacial periods may have 
allowed for the hybridization of these isolates (Brandon- 
Jones, 1996; Meijaard & Groves, 2006). Consistent with 
the karyotype stability within the subfamily Colobinae 
(O'Brien et al, 2006), hybridization between different 
species of Trachypithecus and between Trachypithecus 
and closely related genera has been observed in captivity 
and could explain the paraphyly of T. pileatus and T. geei 
(Que et al, 2007; Schempp et al, 2008; Wangchuk, 2005; 
Wangchuk et al, 2008). Coupled with hybridization, the 
observed discrepancies between autosomal and 
mitochondrial genes might also have resulted from 
maternal philopatry and male dispersal from natal groups, 
common in living Trachypithecus and possibly the 
ancestral state of the genus, which could have produced 
these different patterns of genetic inheritance (Karanth et 
al, 2008). 

Previous study including T. p. crepuscula from 
Vietnam and 7. p. phayrei from India have found 
Trachypithecus phayrei to be paraphyletic (Karanth et al, 
2008). As a result, it remained unclear whether the 
subspecies T. p. crepuscula is itself monophyletic and 
whether T. p. shanicus is close a relative to either of the 
other two subspecies. For the first time, we found 7. p. 
crepuscula from Vietnam, southwestern China, and 
Myanmar are monophyletic and share a young MCRA at 
about 0.41 Ma and T. p. shanicus diverged from T. p. 
phayrei at about 0.37 Ma. However, T. p. shanicus is 
represented by a different and probably more ancient 
PRMI haplotype than the other two subspecies. This 
might be the result of nuclear introgression from T. p. 
shanicus to the ancestor of T. p. crepuscula due to male 
dispersal and female philopatry. Additional taxon 
sampling of other species and multiple unlinked genes 
are needed for reconstructing these robust evolutionary 
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Abstract: In the present study, we report the first complete mitochondrial genome (mitogenome) of the Painted Jezebel, Delias 
hyparete. The mitogenome of Delias hyparete is 15186 bp in length, and has typical sets of 37 genes: 13 protein-coding genes 
(PCGs), 2 ribosomal RNAs, 22 transfer RNAs and a non-coding A+T-rich region. All protein-coding genes are initiated by ATN 
codons, except for COI, which is tentatively designated by the CGA codon, as observed in other butterfly species. A total of 10 PCGs 
harbored the complete termination codon TAA or TAG, while the COI, COI and NDS genes ended at a single T residue. All 22 tRNA 
genes show typical clover structures, with the exception of the tRNA*"'^9V which lacks the dihydrouridine (DHU) stem and is 
instead replaced by a simple loop. Thirteen intergenic spacers totaling 153 bp, and 13 overlapping regions totaling 46 bp are scattered 
throughout the whole genome. The 377 bp long of D. hyparete A+T-rich region is not comprised of large repetitive sequences, but 
harbors several features characteristic of the lepidopteran insects, including the motif ATAGA followed by an 18 bp poly-T stretch, a 
microsatellite-like (AT); element preceded by the ATTTA motif, an 10 bp polyA-like stretch (AAAAATAAAA) present immediately 


upstream tRNA™*, 
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The insect mitogenome is a circular and double- 
stranded molecule, approximately 14—20 kb in size, 
typically comprised of 37 genes (including 13 PCGs, 22 
tRNA, and 2 rRNA genes) and a non-coding A+T-rich 
region harboring the initiation sites for transcription and 
replication (Wolstenholme, 1992; Boore, 1999; Taanman, 
1999). In recent decades, the mitogenome has been 
widely used in studies of phylogenetics, comparative and 
evolutionary genomics, population genetics and 
molecular evolution, etc. (Ballard & Whitlock, 2004; 
Simonsen et al, 2006), because of its smaller size, faster 
evolutionary rate, maternal inheritance and little 
recombination rather than the nuclear genomics (Vigilan 
et al, 1991; Stoneking & Soodyall, 1996). As of August 
2012, the complete or nearly complete mitochondrial 
genomes have been determined for 376 insect species. 
Among these insect mitogenomes, 25 are from butterfly 
species, and only 3 from Peridae species (Artogeia 
melete, Pieris rapae and Aporia crataegi). 

Found commonly in tropical areas of south China, 
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India, and south-east Asia (Chou, 1998), the Painted 
Jezebel, Delias hyparete Linnaeus, is an attractive 
common garden butterfly species of genus Delias, 
subfamily Pierinae in the family Pieridae. Like many 
other Pieridae species, Delias hyparete is medium sized 
and is not easily mistaken due to its bright wing colours 
and graceful flight pattern. Here, we report the complete 
mitogenome of Delias hyparete, analysis of its 
nucleotide organization and other major characteristics 
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with those of other butterfly species available, with the 
aim of providing more molecular data for studies of this 
organisms molecular identification, population genetics, 
conservation biology, comparative genomics with other 
butterfly species and other relevant areas. 


MATERIALS AND METHODS 


Sample collection and DNA extraction 

Adult individuals of D. hyparete were collected 
from Jinghong, Yunnan, China in July 2006. After 
sample collection, the fresh tissues were preserved in 
100% ethanol immediately and stored at —20 C? until 
DNA extraction. Total genomic DNA was isolated from 
the thoracic muscle of an adult individual using the 
proteinase K-SiO; method, as described by Hao et al 
(2007). 
Primer design, PCR and DNA 
sequencing 

Five short fragment sequences (500—700 bp) of COI, 
COIII, ND4, Cytb and 16S rRNA genes were amplified 
using insect universal primers (Simon et al, 1994; 
Caterino & Sperling, 1999; Simmons & Weller, 2001). 


amplification 


Then, the long PCR primers were designed by Primer 
Premier 5.0 according to the conserved regions based on 
the newly acquired sequences of those short gene 
fragments (Table 1). All primers were synthesized by 
Sangon Biotechnology, Shanghai, China. 

Short fragments were amplified under the following 
condition: 5 min of initial denaturation at 95 C?, 35 
cycles of denaturation at 95 C? for 50 s, annealing at 
45-51 C? (depending on primer pairs) for 50 s, 
extension at 72 C? for 1 min and 30 s, and a final 
extension at 72 C? for 10 min. Long PCR techniques 
were performed using TaKaRa LA Taq polymerase with 
the cycling parameters: an initial denaturation at 95 C? 
for 5 min, followed by 30 cycles of denaturation at 95 C? 
for 50 s, annealing at 46—55 C? (depending on primers) 
for 50 s, extension at 68 C? for 2 min and 30 s during the 
first 15 cycles, and then an additional 5 s per cycle 
during the last 15 cycles, and a final extension at 68 C? 
for 10 min. PCR products were visualized by 
electrophoresis on 1.2% agarose gel, then purified using 
a 3S Spin PCR Product Purification Kit and sequenced 
directly with an ABI-377 automatic DNA sequencer. For 
each long PCR product, the full, double-stranded 
sequence was determined by primer walking. 


Table1 PCR and long-PCR primers used in this study 





Primers Upper primer sequence (5' 3") Lower primer sequence (5'3") TM (©) Size (kb) 
COI* GGTCAACAAATCATAAAGATATTG TAAACTTCAGGGTGACCAAAAAAT 50.0 0.67 
COI-COIII GAACTGAACTGGGAAACC AAGCCTGAAGGATAGAAA 50.5 2.96 
COIII* GTTCTGAGATTTCAGGTAA TTACTAATAAATCATTTGC 49.6 0.65 
COIII-ND4 ATTCCCTTTAATCCTTTCC CGTTTAGGGAGACGAAGA 55.0 3.0 
ND4* TTATGAAAGAAATTCTTT CAGATATAAGGGTTAATT 45.5 0.5 
ND4-Cytb ATTACTCGCAATAAACCG TACAGCAAATCCTCCTCA 46.5 2.6 
Cytb* TATGTACTACCATGAGGACAAATAT ATTACACCTCCTAATTTATTAGGAAT 47.0 0.5 
Cytb-16S ACTGAATCTGAGGAGGAT CTTAGGGATAACAGCGTA 50.0 1.93 
16S* CTGTACAAAGGTAGCATA GCCAAAACTTTAGTCTAG 49.0 0.5 
16S-COI ATTACGCTGTTATCCCTAT TTCGTGGAAAGGCTATGTC 49.8 3.0 


“sk” universal primer 


Sequence analysis and annotation 

The determined sequences were checked firstly with 
the NCBI Internet BLAST search function. The raw 
sequence files were proofread and assembled in BioEdit 
7.0 (Hall, 1999) as well as ClustalX 1.8 (Thompson et al, 
1997). Individual D. hyparete genes and the A+T-rich 
region were identified by aligning the sequences with 
homologous regions of other lepidopteran mitogenome 
sequences using Sequin 5.35. Sequence annotation was 
performed using the DNAStar package (DNAStar Inc., 
Madison, WI, USA) and the online blast tools available 
through the NCBI web site. The concatenated amino acid 
sequences of the 13 PCGs were obtained and analysed by 
ClustalX 1.8 (Thompson et al, 1997) and MEGA 5.0 
(Tamura et al, 2011). Identification of tRNA genes was 
verified using the program tRNAscan-SE 1.21 (Lowe & 
Eddy, 1997). Putative tRNAs that could not be found by 
tRNAscan-SE were identified by sequence comparisons 


Zoological Research 


with other lepidopteran tRNA genes. The tandem repeats 
in the A+T-rich region were predicted using the Tandem 
Repeats Finder available online (http://tandem.bu.edu/ 
trf/trf.html) (Benson, 1999). All mitogenome sequence 
data has been deposited into GenBank under the 
accession number JX094279. 


RESULTS 


Genome organization and structure 

The complete mitogenome of D. hyparete is 15186 
bp long, containing 13 protein-coding genes (ND1-6, 
ND4L, COI-III, Cytb, ATP6, ATP8), 2 ribosomal RNA 
genes (rRNA and srRNA), 22 putative tRNA genes and 
one major non-coding A+T-rich region (control region) 
(Figure 1, Table 2). Like many other insect mitogenomes, 
its major strand codes for 23 genes (9 PCGs and 14 
tRNAs) and the A+T-rich region, while the minor strand 
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codes for the remaining 14 genes (4 PCGs, 8 tRNAs and 
2 rRNA genes). Besides the non-coding A+T-rich region, 
13 intergenic spacer sequences ranging from 1 to 46 bp 
(153 bp in total) and 13 overlapping regions from 1 to 8 
bp (46 bp in total) are spread over the whole mitogenome 
(Table 2). The 7 overlapping nucleotides (ATGATAA) 
are located between ATPS and ATP6. Moreover, the 
overlapped nucleotide sequences were also detected in 
all lepidopteran mitogenomes sequenced to date, and 
function in forming the structure of the hairpin loops for 
posttranslational modifications (Kim et al, 2006; Fenn et 
al, 2007). 


Delias hyparete 


Mitochondrial Genome 
15, 186 bp 





H 


Figure 1 Circular map of Delias hyparete mitochondrial 
genome 
COI, COI and COIII refer to the cytochrome oxidase subunits; Cytb refers 
to cytochrome B; ATP6 and ATP8 refer to subunits 6 and 8 of FO ATPase; 
NDI- 6 refer to components of NADH dehydrogenase. tRNAs are denoted 
as one-letter symbols consistent with the IUPAC-IUB single letter amino 
acid codes. Gene names that are not underlined indicate a clockwise 
transcriptional direction, whereas underlined indicate a counter-clockwise 
transcriptional direction. 


Similar to other available butterfly mitogenomes, 
the nucleotide composition of the whole D. Ayparete 
mitogenome is A+T biased (79.8%), whose value is 
identical to that of Artogeia melete (79.8%), and slightly 
higher than Pieris rapae (79.7%), and well within the 
range of butterfly species known to date, from 79.1% in 
Eumenis autonoe to 82.7% in Coreana raphaelis. 
Furthermore, the A+T content varies profoundly among 
genes and regions of the D. hyparete mitogenome; for 
example, 92.0% for the A+T-rich region, 85.1% for 
SrRNA gene, 80.4% for all the tRNA genes, 83.7% for 
IrRNA gene and 78.4% for all the PCGs (Table 3, Table 
4). The A+T- and G+C-skew of the whole D. hyparete 
mitogenome is —0.018 and —0.228, respectively (Table 4), 
indicating that more Ts and Cs than As and Gs are used. 
Both of the A+T- and G+C-skew values are within the 


Kunming Institute of Zoology (CAS), China Zoological Society 


corresponding values of other lepidopterans, ranging 
from —0.048 (Parathyma sulpitia) to 0.059 (Bombyx 
mori) and from —0.318 (Ochrogaster lunifer) to —0.158 
(Coreana raphaelis), respectively. 


Table 2 Organization of the Delias hyparete mitochondrial 








genome 
Direc- gs Size — Intergenic Start — Stop 
Gene tion Foe (bp) length" codon codon 
tRNAP F 1-65 65 0 
tRNA" F 63-127 65 -3 
tRNA" R 125-193 69 -3 
ND2 F 240-1253 1014 46 ATT TAA 
tRNA"? F 1252-1317 66 -2 
tRNA R 1310-1375 66 -8 
tRNA®” R 1375-1439. 65 -1 
COI F 1442-2972 1531 2 CGA T 
tRNA" UOY® FE 2973-3038 66 0 
COII F 3040-3718 679 1 ATG T 
tRNAP* F 3719-3789 71 0 
tRNA“? F 3789-3854 66 -1 
ATP8 F 3855-4016 162 0 ATT TAA 
ATP6 F 4010-4687 678 -7 ATG TAA 
COII F 4691—5482 792 3 ATG TAA 
tRNA” F 5485-5550 66 2 
ND3 F 5551-5904 354 0 ATT TAG 
tRNA“ F 5903-5968 66 -2 
tRNA” F 5968-6033 66 -1 
tRNA" F 6038-6102 65 4 
(RNAS""6 F — 6103-6163 61 0 
tRNA" F 6164-6231 68 0 
tRNA" R 6240-6303 64 8 
NDS R 6306-8026 1721 2 ATT T 
tRNA“ R — 8051-8115 65 24 
ND4 R 8155-9461 1307 39 ATG T 
ND4L R 9455-9748 294 -1 ATA TAA 
tRNA™ F 9741-9804 64 -8 
tRNA? R 9805-9 869 65 0 
ND6 F 9872-10396 525 2 ATT TAA 
Cytb F 10396-11544 1149 -1 ATG TAA 
tRNA UN F 11546-11610 65 -2 
NDI R 11627-12562 936 16 ATA TAA 
tRNA "C R 12567-12634 68 4 
IrRNA R 12635-13 970 1336 0 
tRNA“ R 13971-14035 65 0 
srRNA R 14036-14809 774 0 
pros 4810-15186 377 0 


"In the column intergenic length, the positive number indicates interval base 
pairs between genes, while the negative number indicates the overlapping 
base pairs between genes. 


Protein-coding genes (PCGs) 
The 13 PCGs are 11 142 bp in length. Their 
nucleotide frequency of the majority (J strand) is T 
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Table 3 Characteristics of butterfly species mitogenomes 
Whole genome PCG? IrRNA srRNA A+T-rich region GenBank 
Taxon Size Size Size accession References 
i 0, a 0, 0, '0, '0, 
Size(bp) A+T(%) No.codons A+T(%) (bp) A+T% (bp) A+T% (bp) A+T% no. 
Papilionidae: — 15389 813 3734 80.1 1344 5839 773 854 504 936 Nc 014053 Kimetal, 
Parnassius bremeri = 2009 
Agehana maraho 16004 80.5 3718 782 1333 83.7 779 855 1258 952 NC 014055 M 
Sericinus montela 15242 80.8 3 691 799 1338 83.6 760 846 408 94.1  HQ259122 Ji etal, 2012 
Teinopalpus aureus 15 242 79.8 3720 78.3 1320 4824 781 85.6 395 93.1 NC 014398 Unpublished 
Nymphalidae 15245 79.7 3717 78.1 1331 839 788 837 430 96.0 NC 013604 Hu etal, 2009 
Acraea issoria = 
; Kim et al, 
Eumenis autonoe 15489 791 3 728 76.8 1335 837 775 852 678 94.6 | GQ868707 2010 
Argyreus hyperbius 15156 — 80.8 3718 79.4 1330 845 778 85.2 349 95.4 NC 015988 ee al, 
Calinaga davidis 15267 804 3737 78.8 1337 838 773 859 389 92.0  HQ658143 i 
: : Kim et al, 
Argynnis nerippe 15140 80.9 3 729 79.7 1321 845 773 849 329 95.7 NC 016419 or 
Apatura ilia 15242 80.5 371 78.9 1333 860 776 849 403 92.5  JF437925 s al, 
vt Tian et al, 
Parathyma sulpitia 15268 81.9 3 729 80.6 1319 84.7 779 857 349 94.6 — JQ347260 2S 
Sasakia charonda 15244 799 3 695 782 1323 844 775 850 380 91.8 NC 014224 Unpublished 
Euploea mulciber 15166  8L5 3713 802 1314 846 776 853 399 93.5 NC 016720 Unpublished 
Libythea celtis 15164 812 3732 80.1 1335 84.7 774 854 328 96.3 NC 016724 Unpublished 
Lycaenidae Kim et al, 
TEAM 15314 827 3 708 81.5 1330 853 777 858 375 942  DQ102703 2006 
Protantigius superans 15248 817 3 720 80.4 1331 851 739 856 361 93.6 NC 016016 eae 
; r . Kim et al, 
Spindasis takanonis 15349 823 3 728 81.1 1333 856 77] 847 371 946 NC 016018 Soila 
Pieridae 15140 79.8 3715 784 1319 834 77] 855 351 892 NC 010568 Hong etal, 
Artogeia melete = 2009 
Pieris rapae 15157 79.7 3721 78.2 1320 84.0 764 850 393 91.6 HM156697 Meo oah 
Aporia crataegi 15140 — 813 3 708 79.9 : 85.4 - 85.5 354 | 952  JNT796473 iol 
Delias hyparete 15186 79.8 3 703 784 1336 83.7 774 854 377 92.0  7X094279 This study 
Hesperoiden 15468 80.5 3 698 789 1343 841 774 864 429 88.1 JF713818  Haoetal, 
Ctenoptilum vasava 2012 
a: Termination codons were excluded in total codon count; b: Protein coding genes. 
Table 4 Nucleotide composition and skewness in different regions of the Delias hyparete mitogenome 
Size Nucleotide composition (%) 
Region b AT-skew GC-skew 
(bp) T C A G ACT 
All gene 15 186 40.6 12.4 39.2 7.8 79.8 —0.018 —0.228 
Genes on J-strand 6 884 43.7 134 332 9.7 76.9 ~0.137 —0.160 
Genes on N-strand 4 258 334 12.8 474 6.4 80.8 0.173 —0.333 
PCGs 11 142 39.8 13.1 38.6 8.4 78.4 -0.015 -0.219 
First codon positions 3 703 38.0 10.0 41.5 10.0 79.5 0.044 0.000 
Second codon positions 3 703 37.0 17:7 33.5 12.1 70.5 —0.050 —0.188 
Third codon positions 3 703 44.0 11.8 40.8 3.3 84.8 —0.038 —0.563 
tRNA 1447 39.1 11.3 413 8.3 80.4 0.027 —0.153 
IrRNA 1336 42.8 10.9 40.9 54 83.7 —0.023 —0.337 
stRNA 774 46.5 10.2 38.6 47 85.1 —0.093 —0.369 
A+T-rich region 377 49.3 45 42.7 3.4 92.0 ~0.072 —0.139 
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(43.7%), A (33.2%), C (13.4%) and G (9.7%), whereas, 
the minority strand (N strand) is A (47.4%), T (33.4%), 
C (12.8%) and G (6.4%) (Table 4). 

All PCGs of the D. hyparete mitogenome are 
initiated by typical ATN codons (ND2, ATP8, ND3, ND5 
and ND6 with ATT; COI, ATP6, COII, ND4 and Cytb 
with ATG; ND4L and NDI with ATA), while the COI 
gene is tentatively designated by the CGA codon. Nine 
PCGs harbored the complete termination codon TAN (8 
genes with TAA, ND3 with TAG) and the remaining 4 
genes (COI, COI, ND5 and ND4) ended with a single T 
right ahead of tRNA genes (Table 2). 

D. hyparete harbors 3 703 codons, excluding 
termination codons, and this number is slightly lower 
than those of Aporia crataegi (3 708) and Coreana 
raphaelis (3 708) (Table 3). The base composition at 
each codon position of the concatenated 13 PCGs 
showed that the A+T content at the third codon position 
(84.8%) was higher than those of the first (79.5%) and 
second (70.5%), consistent with other sequenced 
lepidopteran species. The relative synonymous codon 
usage (RSCU) revealed that NNU and NNA codons were 
greater than 1, indicating that the U and A are more 
frequently used for the third codon positions. The codon 
usage bias and A+T bias of the third codon position in 
PCGs are positively correlated with each other. 
Furthermore, ATT (Ile), AAT (Asn), TTT (Phe) and 
TTA (Leu) are the most frequently used codons, 
accounting for 9.2%, 7.2%, 6.6% and 6.4% of all the 
codons, respectively. 


Transfer RNA genes and ribosomal RNA genes 

In total, 22 transfer RNA genes are interspersed 
throughout the D. Ayparete mitogenome, ranging in 
length from 61 bp (tRNA) to 71 bp (tRNA”*) 
(Table 2). All of them are shown to be folded into the 
expected clover-leaf secondary structures except that the 
tRNA*"4°) lacks the dihydrouridine (DHU) stem, which 
is replaced by a simple loop (Figure 2). The nucleotide 
composition of the 22 tRNA genes (1 447 bp in total size) 
is also A+T biased (80.4%). Sixteen tRNA genes possess 
24 pair mismatches in their stems, including 5 pairs in 
the amino acid acceptor stems, 8 pairs in the DHU stems, 
9 pairs in the anticodon stems and 2 pairs in the TPC 
stems. These mismatched bases are mainly GeU and U*U, 
with the exception of tRNA"! and tRNA‘) that 
exhibit A*C mismatches. 

Like all other insect mitogenome sequences, two 
ribosomal RNA genes (1 336 bp /rRNA and 774 bp 
srRNA) are presented in D. hyparete mitogenome. These 
two rRNA genes are composed of 83.7% and 85.1% A+T 
content, and located between tRNA“““©™ and tRNA™, 
and between tRNA™ and the A+T-rich region, 
respectively (Table 2, Table 3). 
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A+T-rich region 

The 377 bp A+T-rich region of D. hyparete 
mitogenome is located between the srRNA and tRNA“ 
(Figure 1, Table 2) and shows a relatively higher level of 
A+T content (92.0%) than those of other D. hyparete 
mitogenome regions (Table 4). This region does not 
contain any conspicuous macro-repeat units, but harbors 
some structures typical of other butterfly mitogenomes: 
the ON (origin of minority or light strand replication) 
located 18 bp upstream of the 5'-end of the srRNA gene, 
which contains the motif ATAGA followed by an 18 bp 
poly-T stretch; the microsatellite-like elements (TA); and 
(AT); located 196 bp and 273 bp upstream of srRNA 
gene (Figure 3); the microsatellite-like repeat (AT); 
preceded by the ATTTA motif; and an 10 bp polyA-like 
stretch (AAAAATAAAA) present immediately upstream 
of tRNA“ (Figure 3). 


DISCUSSION 


General features of D. hyparete mitogenome 

The 15 186 bp D. hyparete mitogenome is the 
largest among the three available Pieridae species, 
however its size is well within the range detected in 
completely sequenced butterfly species, ranging from 15 
140 bp in Artogeia melete (Hong et al, 2009), Aporia 
crataegi (Park et al, 2012) and Argynnis nerippe (Kim et 
al, 2011b) to 16 094 bp in Agehana maraho (Wu et al, 
2010). Likewise, the gene content, orientation and order 
are identical to the majority of other lepidopterans. This 
type of gene arrangement has been considered to be a 
synapomorphy for the order Lepidoptera (Kim et al, 
2011a), differing from those ancestral insects due to the 
movement of tRNA” to a position 5'-upstream of 
tRNA‘, resulting in the order tRNA™“, tRNA", and 
tRNA", rather than the order tRNA", tRNA", and 
tRNA“ (Boore et al, 1998). 

Although the gene content of the insect mitogenome 
is highly conserved, specific characteristics vary 
considerably among its different groups, e.g. variable 
length of A+T-rich region, different number of tRNA 
genes, and variable organization of genes (Hong et al, 
2009). For instance, Coreana rapaelis has an extra copy 
of tRNA*(AGN) (Kim et al, 2006), Acraea issoria 
harbors an extra copy of tRNA” ^Y (Hu et al, 2010), 
Parnassius bremeri harbors two tRNA™ -like and 
tRNA") like sequences (Kim et al, 2009), Argynnis 
nerippe possesses two extra tRNA™-like and tRNA" U Y®- 
like sequences (Kim et al, 2011b), and Ctenoptilum 
vasava includes an extra copy of the trnS (AGN) gene 
and a tRNA-like insertion trnL (UUR) (Hao et al, 2012). 


PCGs 
To date, no agreement has been reached concerning 
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Figure 2 Predicted secondary clover-leaf structure of the Delias hyparete 22 tRNA genes 
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The origin of minority strand replication 
ATATTAAATATTTAATATAAATTATAAATATTTTAATAATTTCTTTCTTTCTTTTTTCATAATATGTTTATAAA 
TATAATTTATTAATAAACGATCAATAATAAGTGTAAATAAAGAATAATTATAAAAGTAATATTAATTTTTCA 
ATA TATATATATATTATTAATAATTATTTAAATTTTTAATATAATTTTTATATTTATAATATATTAAA 


Microsatellite-like (TA)5 element 


AATTTAATATACATATATATATATATGTGCACGTATATTAAATTAATTAAGTAAATTTTTGTTAAATTTACTC 


Mircrosatellite-like (AT)7 element 


Poly-A-like stretch (in the J strand) 


Figure 3 The structure of the A+T-rich region of the Delias hyparete mitochondrial genome 


the insect COI start codons. For example, the 
trinucleotide TTG was supposed to be the initiation 
codon for COI gene in the Caligula boisduvalii (Hong et 
al, 2008), Acraea issoria (Hu et al, 2010) and Calinaga 
davidis (Xia et al, 2011); ATA for Sericinus montela (Ji 
et al, 2012), ATT for Ctenoptilum vasava (Hao et al, 
2012) and Aporia crataegi (Park et al, 2012); the 
tetranucleotide TTAG for Antheraea pernyi (Liu et al, 
2008) and Coreana raphaelis (Kim et al, 2006); and the 
hexanucleotides TATTAG for Ostrinia nubilalis and O. 
furnicalis (Coates et al, 2005), ATTACG for Papilio 
xuthus (Feng et al, 2010), TTAAAG for Pieris rapae 
(Mao et al, 2010). Here, by sequence homology 
comparison with other available lepidopteran species, we 
tentatively presumed CGA as the start codon for COI 
gene (Lee et al, 2006; Jiang et al, 2009; Hong et al, 2009; 
Kim et al, 2009; Kim et al, 2010; Wang et al, 2011; Tian 
et al, 2012, Chen et al, 2012). 


Transfer and ribosomal RNA genes 

The D. hyparete mitogenome harbors 22 tRNA 
genes scattered throughout the whole genome. Unlike 
some other butterfly species, no extra tRNA gene has 
been detectedin D. hyparete (Kim et al, 2006; Hu et al, 
2010; Kim et al, 2009; Kim et al, 2011b). The D. 
hyparete tRNAs harbor a total of 24 unmatched base 
pairs within the stem region: 17 are G*U pairs, the 
remaining 7 are atypical pairs, including 5 U*U pairs and 
2 A*C pairs, all of which form a weak bond in the tRNAs 
and could be corrected through RNA-editing 
mechanisms (Yokobori & Pääbo, 1995; Lavrov et al, 
2000). All tRNAs possess an invariable 7 bp in the 
aminoacyl stem, as commonly found in other butterfly 
species (Kim et al, 2009; Kim et al, 2010; Chen et al, 
2012), 21 tRNAs harbor 7 bp in the anticodon loop 
(except 9 bp in tRNA"), and 19 tRNAs harbor 5 bp in 
the anticodon stem (except 4 base pairs in tRNA”, 
tRNA™ and tRNA”). Additionally, most tRNA size 
variations result from length variations in the DHU and 
TYC arms, within which loop sizes (3—9 bp) are slightly 
more variable than the stem sizes (3—5 bp). 

In D. hyparete, the two rRNA locations are the same 
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as other lepidopterans. Their sizes are likewise well 
within the range of other known butterfly species, from 1 
314 bp of Euploea mulciber (Unpublished, NC 016720) 
to 1344 bp of Parnassius bremeri (Kim et al, 2009) for 
IrRNA, and from 739 bp of Protantigius superans (Kim 
et al, 2011a) to 788 bp of Acraea issoria (Hu et al, 2010) 
for srRNA, respectively. The A+T content value of the 
two rRNA genes is consistent with those observed in 
other available butterfly species (Table 3). 


Non-coding regions 

Intergenic spacer sequences The four spacers longer 
than 10 bp are located respectively between tRNA“ and 
ND2 (46 bp), tRNA"" and ND4 (39 bp), ND5 and tRNAP^ 
(24 bp) and tRNA?" V and ND1(16 bp). The remaining 
9 spacers shorter than 10 bp are dispersed throughout the 
whole genome, accounting for 18.3% of the total length 
of entire intergenic spacers. 

The first longer intergenic spacer located between 
tRNA“ and ND2 is not only found in D. hyparete, but 
has also been detected in other sequenced lepidopterans 
(Lee et al, 2006; Kim et al, 2009; Hu et al, 2010; Kim et 
al, 2010), ranging in size from 40 bp in Parnassius 
bremeri to 87 bp in Sasakia charonda. This intergenic 
spacer is usually considered a constitutive synapomorphic 
feature and a molecular signature of lepidopteran 
mitogenomes due to its absence in non-lepidopteran 
insects (Junqueira et al, 2004; Cha et al, 2007; Cameron 
& Whiting, 2008). Additionally, the 67.4% homology of 
this intergenic spacer with its neighboring ND2 gene 
reveals its ND2 duplication origin (Kim et al, 2009; Kim 
et al, 2010), similar to the cases presented in other 
lepidopterans, including Artogeia melete (68.8%) (Hong 
et al, 2009), Pieris rapae (71.7%) (Mao et al, 2010), 
Coreana raphaelis (62.5%) (Kim et al, 2006), Eumenis 
autonoe (74%) (Kim et al, 2010), Parnassius bremeri 
(70%) (Kim et al, 2009), Bombyx mandarina (70.2%) 
(Pan et al, 2008). 

The second largest spacer is inserted between 
tRNA" and ND4 (39 bp), which is only correspondingly 
detected in the Acraea issoria mitogenome (4 bp) (Hu et 
al, 2010), while absent in other sequenced butterfly 
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species (e.g., Kim et al, 2006; Kim et al, 2009; Kim et al, 
2010; Wang et al, 2011; Hong et al, 2009; Mao et al, 
2010; Ji et al, 2012; Hao et al, 2012). The sequence 
alignment of this spacer with the neighboring ND4 gene 
revealed a sequence homology of 82.1%, also suggesting 
its origin of ND4 duplication (Figure 4). The third largest 
spacer, located between ND5 and tRNA”, is 24 bp long 


Delias hyparete (82.1%) 


and this region is highly variable in length among the 
butterfly species (8-24 bp) (Hu et al, 2010; Kim et al, 
2006; Wang et al, 2011; Ji et al, 2012) and sometimes 
even overlaps between the two genes, such as in 
Parnassius bremeri (Kim et al, 2009) and Eumenis 
autonoe (Kim et al, 2010). 


spacer(8116-8154) TAATAAATTTAAAATCAAA - - -- TICTATAATCAAATTT- - -TAAA 


ND4(8904-8949) 


* KOR OK ok ok 


wR KKK k k k 


TCTTAAATATAAAATTAAAGAATTTCTTTCATAAAATTTATATAAA 


Mekckok k kk kok ok ok Gn k eK k k 


Figure 4 Sequence alignment of the intergenic spacer located between tRNA"5 and ND4 gene with its neighboring partial ND4 gene 


In some butterfly species, such as Acraea issoria, 
Argyreus hyperbius, Calinaga davidis, Argynnis nerippe 
and Parathyma sulpitia there exists 2 bp overlapping 
sequence resided between the :iRNA?""-" and NDI 
genes. However, for D. hyparete, which we used in this 
study and other butterfly (moth) species, intergenic 
spacer sequences are detected, such as 9 bp spacer in 


tRNASer(UCN) 
a 


Coreana raphaelis 






Protantigius superans 
Parnassius bremeri 

Ostrinia nubilalis ATACTA. 
Delias hyparete 


Apatura ilia 


Diatraea saccharalis (Unpublished, NC 013274) and 38 
bp spacer in Ostrinia nubilalis (Coates et al, 2005). All 
of these intergenic spacers harbor the ATACTAA motif, 
which is conserved in all sequenced lepidopteran species 
and has been suggested as a possible recognition site for 
the transcription termination peptide (mtTERM protein) 
(Cameron & Whiting, 2008; Taanman, 1999) (Figure 5). 


NDI 
—«JT— —— 


ATACTAA|TTATTTTTTATATTATAA 
ATACTAA 
ATACTAA| AATTAATAACTAACT 


TTATTTTTTATATTATAAAAT 


AAATATTAACTTACTTACTTAATTAATTCTACTAAAATA 


TCTATTAATTT|ATACTAA AAATAATTATTAATTAAA 
TTAATIT|ATACTAA|AATCATTITAATAATTTA 


Hyphantria cunea ATTAATTTTTATTTTTATCTT|ATACTAA|AATTAATTGATTAATTATGCTAA 


Artogeia melete 
Eumenis autonoe 
Ctenoptilum vasava TAATIT|A 
TTT|^ 
ATTAATTTTITATTA |^ 
TATTAATTTITATICA f 


TATTAATTTAATT f 


Pieris rapae 
Bombyx mori 
Bombyx mandarina 


Saturnia boisduvalii 





TTAATIT|ATACTAA AAATATTTATTAAATAA 
AAATTT|ATACTAA|ATTTATTTATTTTTTAATTTAAT 


AAATATATTATTAAATTA 


AAATATITATTAAATAATAAA 


ATATTACAATTAAAATA 


AAATATTACAATTAAAAT 
AAATAATTCAACTATAATAAA 


Figure 5 Sequence alignment of the internal spacer regions located between tRNA” C and ND] 


Boxed nucleotides indicate the conserved heptanucleotide region (ATACTAA) detected in all sequenced lepidopteran insects. Underlined nucleotides indicate 


the adjacent partial sequences of the (RNAS CV 
A+T-rich region As in other lepidopteran insects, the 
A- T-rich region of D. hyparete mitogenome includes the 
origin sites for transcription and replication (Taanman, 
1999). Its 377 bp size is well within the range observed 
in the completely sequenced butterfly species, ranging from 
328 bp in Libythea celtis (Unpublished, NC 016724) to 
1270 bp in Papilio maraho (Wu et al, 2010). The A+T 
content of D. hyparete A+T-rich region (9296) is 
slightlylower than the corresponding regions of Aporia 
crataegi (95.2%) (Park et al, 2012) within the determined 
Pieridae species, but this value is also within the range 
from 88.1% in Ctenoptilum vasava (Hao et al, 2012) to 
96.3% in Libythea celtis (Unpublished, NC 016724). 
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gene and ND/ gene. Arrows indicate the transcriptional direction. 


The presence of varying copy numbers of tandem 
repeated elements in the mitochondrial A+T-rich region 
has frequently been reported in insects (Zhang et al, 1995; 
Cameron & Whiting, 2007), including some lepidopteran 
insects. For example, the Antheraea pernyi A+T-rich 
region harbors a repeat element of 38 bp tandemly 
repeated 6 times (Liu et al, 2008), the A. proylei has 5 
repeat elements (Arunkumar et al, 2006; Liu et al, 2008), 
Japanese Bombyx  mandarina harbors a tandem 
triplication of a 126 bp repeat unit, while only one of the 
repeat elements is found in the A+T-rich region of B. 
mori and Chinese B. mandarina (Yukuhiro et al, 2002; 
Pan et al, 2008). Within Papilionoidea, the nymphalid 
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Eumenis autonoe is the only species reported to have a 
tandem repeat sequence, which harbors 10 identical 27 
bp long tandem repeats and one 13 bp long incomplete 
final repeat ( Kim et al, 2010). In this study, no large 
sequence repeats were detected in the D. hyparete A+T- 
rich region, which harbors several features common to 
the other lepidopteran insects, such as the motif ATAGA 
followed by 18 bp long poly-T stretches close to srRNA 
gene, and a microsatellite-like (AT)s preceded by the 
ATTTA motif and a (AT), element (Fig. 3). The 
microsatellite-like repeat preceded by the ATTTA motif 
is common in insect mitogenomes, and has been found in 
all other lepidopteran species. For example, ATTTA 
(TA)g in Hyphantria cunea (Liao et al, 2010), ATTTA 
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Peak identification for ChIP-seq data with no controls 
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Abstract: Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is increasingly being used for genome-wide profiling 
of transcriptional regulation, as this technique enables dissection of the gene regulatory networks. With input as control, a variety of 
statistical methods have been proposed for identifying the enriched regions in the genome, i.e., the transcriptional factor binding sites 
and chromatin modifications. However, when there are no controls, whether peak calling is still reliable awaits systematic 
evaluations. To address this question, we used a Bayesian framework approach to show the effectiveness of peak calling without 
controls (PCWC). Using several different types of ChIP-seq data, we demonstrated the relatively high accuracy of PCWC with less 
than a 5% false discovery rate (FDR). Compared with previously published methods, e.g., the model-based analysis of ChIP-seq 
(MACS), PCWC is reliable with lower FDR. Furthermore, to interpret the biological significance of the called peaks, in combination 
with microarray gene expression data, gene ontology annotation and subsequent motif discovery, our results indicate PCWC 
possesses a high efficiency. Additionally, using in silico data, only a small number of peaks were identified, suggesting the 


significantly low FDR for PCWC. 


Keywords: ChIP-seq; Bayesian; Peak calling; Gene regulation 


With the advance of high-throughput sequencing 
technologies, both global survey of the genome structure 
and transcriptome and transcriptional regulation has 
become more accurate and sensitive. As one of the more 
powerful and widely used experimental techniques for 
DNA-protein interactions in vivo, chromatin 
immunoprecipitation followed by sequencing (ChIP-seq) 
is one of the early applications of next generation 
sequencing (NGS). Since the first study in 2007 
(Mikkelsen et al), ChIP-seq technology has been 
increasingly used for mapping DNA-binding sites by 
transcriptional factors and chromatin modifications. 
Compared with the earlier method of chromatin 
immunoprecipitation followed by tiling microarray 
(ChIP-chip) (Kapranov et al, 2002), ChIP-seq has more 
advantages, e.g., higher resolution, cost-effectiveness and 
technical simplification. At the same time, ChIP-seq 
Itself also has several disadvantages, including efficiently 
computational analysis techniques, sequencing depth 
requirement, and the like (Park, 2009). 

Currently, there are three commonly used types of 
control samples in ChIP experiments: input DNA (no 
immunoprecipitation (IP) DNA samples); mock IP DNA 
(DNA obtained from IP without antibodies); and DNA 
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from non-specific IP (such as IP with immunoglobulin 
G). Concomitantly, several tools for ChIP-seq analysis 
have been developed, the majority of which require a 
matched control sample to determine fold enrichment 
and significance of peak signals. This 1dea of fold ratio 
relative to controls used for ChIP-seq is similar with 
ChIP-chip data analysis, which in the currently used 
methods is necessary for peak calling (Rozowsky et al, 
2009; Valouev et al, 2008). Meanwhile, several new 
methods based on Validation Discriminant Analysis 
(Micsinai et al, 2012), Hidden Markov Models (Choi et 
al, 2009) or Bayesian Models (Spyrou et al, 2009) 
combine ChIP-seq and ChIP-chip data for peak 
identification. Nonetheless, a systematic analysis of 
ChIP-chip and ChIP-seq datasets revealed that the input 
data has variable effects on peak finding (Ho et al, 2011), 
suggesting of the need for a high quality input sample for 
peak calling. Additionally, sequencing depth significantly 
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impacts the peak calling, further causing bias of peak 
identification (Chen et al, 2012). To improve peak calling 
accuracy, Osmanbeyoglu et al (2012) utilized a strategy 
of co-regulation binding by integrating multiple sources 
of biological information. Recently, although several 
novel tools for peak calling without controls have been 
made available (Cairns et al, 2011; Fejes et al, 2008; 
Hower et al, 2011), for example, the model-based 
analysis of ChIP-seq (MACS) defines a dynamic 
parameter to capture local biases when a control profile 
is unavailable (Zhang et al, 2008). No systematic 
evaluation of peak calling without controls has been 
carried out, even for MACS or BayesPeak. 

Given the bias of control data in the ChIP-seq peak 
calling, we sought to address whether ChIP-seq peak 
calling without controls (PCWC) is plausible. Employing 
the Bayesian theorem and simulation-based empirical 
estimates of background distribution, we demonstrated 
that our method ChIP-seq peak identification with no 
control is reliable, with an FDR lower than 5%. 
Compared with the Poisson-based MACS method, our 
Bayesian framework strategies showed lower FDRs, 
suggesting high selectivity and effectiveness for peak 
calling. The systematic analysis of PCWC we present in 
the current study could serve as an alternative strategy 
for ChIP-seq analysis and subsequent biological 
interpretation of gene regulation. 


MATERIALS AND METHODS 


Data sets 

In order to present the comprehensive performance 
of PCWC, we used the FASTQ-formatted ChIP-seq data 
sets downloaded from Gene Expression Omnibus (GEO) 
(Edgar et al, 2002), including the ChIP-seq data of 
transcription factor E2F1 in mESCs (Chen et al, 2008), 
VDR binding in human lymphoblastoid cell 
(Ramagopalan et al, 2010) and H3K4me3 in mESCs 
(Creyghton et al, 2010), as well as the ABI SOLID ChIP- 
seq data sets of EGR1 ChIP-seq data (Tang et al, 2010) 
and MNase-seq data in two replicates (Valouev et al, 
2011). Furthermore, we integrated the microarray gene 


expression dataset to evaluate the effectiveness of PCWC. 


Additionally, we in silico generated 1x10" 36-bp reads 
using the simreads program of the Rmap package (Smith 
et al, 2009) to simulate control samples for peak calling. 
The detailed description for ChIP-seq is shown in Table 
S1. 


Peak calling based on the Bayesian framework 

On the genome-wide scale, peaks in the ChIP-seq 
experiment are those regions significantly enriched by 
reads. Thus, regarding all uniquely mapped reads of the 
ChIP-seq as background, the density of reads in peak 
regions should be dominantly enriched. In the Bayesian 
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framework (Madigan & Ridgeway, 2003), the posterior 
density for 0 is obtained, up to a proportionality constant 
by multiplying the prior density g(0) by the likelihood 
L(0) where 0 denotes the probability that a region is a 
true peak: 
Pr(0 | data) x L(0)g(0) (1) 
Based on global deep sequencing that reaches to 
single nucleotide resolution, a uniform prior distribution 
is usually supposed, which has been adopted in the 
RNA-seq analysis (Sultan et al, 2008; Wang et al, 2008). 
Here, for simplification and feasibility, we assume a 
uniform distribution for the prior distribution g(0) 
( g(0)-U ), so g(0) = 1, for@ , leading to the 
equation: 


Pr(0 | data) = f L(0)d0 Q) 


For measuring the significant enrichment of reads in 
a peak, we considered that the candidate peaks with the 
number of reads (#reads, a) below a cutoff threshold (c, 
usually 0.01) would fit the following equation, 
Pr( reads > a) =1—Pr(#reads < a) 


-i- f." Ley dee. 9) 


where the cumulative density of #reads reaching to a is 
based on likelihood function L(0) and L(0) is empirically 
estimated from the genome-wide scanning of tag-count 
(10,000 w bp random bins per chromosome). 

Following the optimal parameter a below c, we can 
obtain the read-count (uniquely mapped total reads) with 
a sliding window of size w/2 bp (w=50 bp in default). 
Afterward, the #reads in each w/2 bp sliding window size 
above a 1s regarded as candidate peaks. Peaks spanning 
lower than 100 bp are discarded and the neighboring 
peaks within 1 kb are merged (Guenther et al, 2008) to 
counteract the shifting effects of aligned tags from 
forward and reverse reads. 


False discovery rate (FDR) estimate for the number 
of peaks 

Similar to methods used by Valouev et al (2008), the 
overlapped number of called peaks (using the same 
threshold) between the input dataset and ChIP dataset are 
the false discovery number, and the FDR is the false 
discovery number divided by the total number of peaks 
called in the ChIP experiment. 


Motif analysis 

We used MEME 4.6.1 to discover motifs (Bailey & 
Elkan, 1994). Since the running time can be prohibitively 
long for large sets, the peaks are ranked by the number of 
uniquely aligned reads and only the top 1096 of the peaks 
were selected for motif discovery. A similar 
methodological strategy was used by Ramagopalan et al 
(2010) and Jothi et al (2008). The location of the peak is 
centered and extended by 100 bp on either side. Motifs 
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between 5 and 30 bp in length were determined on both 
strands. 


Microarray data preprocessing and analysis 

A total of 45 microarray data sets for calcitriol 
stimulation on human lymphoblastoid cell lines were 
used for expression analysis. The multi-array average 
(RMA) expression values were calculated using the 
Bioconductor “affy” package (Gentleman et al, 2004). 
The log2 expression signals were used to calculate fold 
change. 


Gene annotation 

Genes regulated by trans-acting factors (or by 
chromatin modification, such as H3K4me3) were defined 
by no more than 5 kb distance of peaks to Refseq 
transcription start site (TSS). The enriched analysis of 
genes was achieved using the DAVID annotation system 
(Huang da et al, 2009). After Benjamini-Hochberg 
correction, P<0.01 were considered significantly 
enriched. 

Peak calling based on the Bayesian model and other 
bioinformatics analyses were implemented in Perl and R 
(data available upon request). 


RESULTS AND DISCUSSION 

To demonstrate the effectiveness of ChIP-seq 
PCWC, we selected five representative ChIP-seq data 
sets from two platforms (three from Illumina and two 
from ABI SOLID platforms, respectively) and one in 
silico data (See Materials and Methods for details). 

We first applied the Bayesian-based approach on 
recently published ChIP-seq data for transcription factor 
E2F1 in mouse embryonic stem cells (mESCs) (Chen et 
al, 2008), vitamin D receptor (VDR) binding in human 
lymphoblastoid cell (Ramagopalan et al, 2010) and H3 
trimethylated at lysine 4 (H3K4me3) in mESCs 
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(Creyghton et al, 2010), all of which were generated 
from the Illumina platform. Employing SOAP2 program 
(Li et al, 2009) with maximal 1 mismatch, the uniquely 
mapped reads are considered for further analysis. Using a 
cutoff threshold ( € 20.01 or 0.001), the number of reads 
a based on 10 000 random bins per chromosome with 
window size (w=50) was estimated (Figure 1). With 
these two parameters, we can empirically determine the 
candidate peaks for each data set. 
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Figure 1 Cumulative density of reads based on genome-wide 
random bins 


Grey dashed line represents the read cutoff for peak calling. 


Vitamin D receptor data 

For the VDR ChIP-seq samples, it is highly suitable 
for evaluating the ChIP-seq peak calling without controls 
due to the inclusion of conditional data (calcitriol treated 
or not) with deep sequencing (ranging from 7.9x10° to 
1.46x10’ uniquely mapped reads, see Table 1) and the 
available microarray gene expression data set. 


Table 1 Summary of the peak calling analysis using the VDR ChIP-seq data 





Samples* Description #Reads #Uniquely aligned Pr (#reads>a)<0.001 # Peaks Mean length? 
SRX022390 unstimulated repl 18 252 156 13 487 568 7 1 989 633.9 
SRX022391 unstimulated_rep2 15 379 663 10 761 914 7 548 708.6 
SRX022392 vitaminD repl 18 391 888 13 937 615 7 2920 455.4 
SRX022393 vitaminD rep2 18 965 010 14 613 990 7 2 835 467.3 
SRX022394 unstimulated repl 12 264 149 10 149 547 7 2223 519.3 
SRX022395 unstimulated rep2 10 145 026 7 930 475 7 5 298 460.7 
SRX022396 vitaminD repl 13 302 506 10 657 775 9 6 755 428.9 
SRX022397 vitaminD rep2 14 526 186 11 755 087 9 6981 434.9 
SRX022398 Input] 15 001 356 11 403 930 6 348 - 
SRX022399 Input2 13 682 506 11 402 869 6 545 = 


*: Sample ID is the accession ID for meta-data documented in GEO database; ^ 
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Firstly, we sought to call peaks in both conditions. 
In the samples not treated with calcitriol, the number of 
peaks without control is between 548 to 5298, while in 
the calcitriol-treated samples the number of peaks is 
between 2835 to 6981 (Table 1), consistent with the 
Ramagopalan's reports (Ramagopalan et al, 2010), where 
they used MACS with two independent controls for peak 
calling. Interestingly, our Bayesian-based PCWC get a 
similar number of peaks compared with the data using 
MACS with controls. 

Next, we conducted analyses on multiple scales. 
The resolution of peaks (VDR binding sites) is also 
comparable with the reported study (Table 1). 
Furthermore, the gene ontology (GO) annotation 
indicates that immune system development (GO:0002520), 
lymphocyte activation (GO:0046649) and T cell 
activation (GO:0042110) are significantly enriched 
(P<0.01, after Benjamini correction), also highly 
concordant with the report (Ramagopalan et al, 2010). 
Additionally, previous studies have experimentally 
validated that VDR (auto-regulation) (Zella et al, 2010), 
CCNC (Sinkkonen et al, 2005), ALOX5 (Seuter et al, 
2007), IRF8 and PTPN2 (Ramagopalan et al, 2010) 
modulated by VDR have significant enrichment of peaks 
(VDR binding) under calcitriol-treated conditions, which 
were all confirmed using our method (Figure SI), 
suggesting the effectiveness of our method for ChIP-seq 
peak calling. 

We then integrated the microarray gene expression 
data (a total of 45 microarrays) to evaluate the accuracy 
of PCWC. Compared with the untreated condition, a 
total of 205 genes (fold change>1.5) were up-regulated 
with calcitriol treatment, of which 134 (65.4%) genes 
were significantly enriched with the called peaks. Of the 
134 genes, 81 (60.4%) genes enriched with peaks are 
located in the intronic regions, 33 genes in the TSS 
regions (TSS+500) bp, consistent with the report that 
under the condition of the calcitriol stimulation, there is 
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an increased VDR binding in the intronic regions 
(Ramagopalan et al, 2010), further suggesting the high 
accuracy of our Bayesian model for PCWC. 

For ChIP-seq data analysis, in addition to the 
integration of gene expression data, the subsequent motif 
discovery analysis for the enriched regions is also 
informative to interpret the biological implications (Park, 
2009). Thus, we carried out a motif analysis of VDR 
binding (see Methods). Due to the prohibitively long 
running time for large data sets, only the top 10% of the 
peaks were used for motif discovery. Figure 2 shows the 
motifs discovered by our method, which are nearly 
identical with the reported data (Ramagopalan et al, 
2010). Collectively, the data presented using the VDR 
dataset indicates that the PCWC is reliable. 
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Figure 2 Inferred consensus binding motif for VDR 


For PCWC, evaluate whether the called peaks are 
real signals relative to the flanking regions is crucial. We 
therefore conducted such analysis based on the ratio of 
peaks to the flanking regions. For each peak, we selected 
300 bp region at both the 5' and 3' flanking sequences. 
As shown in Figure 3, compared with the 5' and 3' 
flanking regions, the peak regions have much higher (at 
least 2 times higher) read coverage, indicating the called 
peaks without control are relatively reliable. Furthermore, 
the distribution of the uniquely mapped bases (in 1 kb 
bins) in the genome for the ChIP and input data also 
indicates that the peak regions in the ChIP data are 
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Figure 3 Density of read coverage ratio between peak and flank (5' and 3') regions 
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enrched in reads, while in the input data, they are 
relatively uniformly distributed (Figure S2). 

We also evaluated the FDR of the Bayesian model 
for PCWC. Similar with the VDR ChIP-seq data analysis, 
peak calling ( Pr(#reads>6)< 0.001 ) for two 
independent controls (from the VDR data sets, Table 1) 
was conducted to evaluate the FDR. We totally identified 
348 and 545 peaks in two independent controls (Table 1), 
respectively. Only less than 396 of peaks in the two 
controls overlap with the ChIP data, indicating that the 
FDR for our method is small. It should be noted that the 
PCWC is also available in the MACS (version 1.4) 
program. Thus we compared the relative effectiveness of 
our Bayesian model with the Poisson-based MACS 
model. Under the default parameter condition, the 
MACS program failed for peak calling for the two 
independent controls due to the large -mfold (-mfold=372) 
parameter, resulting in unavailable construction of the 
background model. When we decreased the -mfold 
parameter to 8 (suitable for building the background 
model), a total of 3278 and 3 486 candidate peaks are 
called with the FDRs of 5% and 8%, respectively, which 
is higher than the FDR of our Bayesian model. The 
advantages of the Bayesian model relies on the genome- 
wide scanning to empirically measure the background 
distribution, while the Poisson-based model is subject to 
overestimation of the number of candidate peaks, 
therefore resulting in higher FDRs (Kharchenko et al, 
2008). 


H3K4me3 data 

We also used the H3K4me3 ChIP-seq data 
(8.77x10* uniquely mapped reads, accounting for 69.7%) 
to compare the peak calling between the Bayesian model 
and the MACS model. Using the MACS with no controls, 
a total of 14346 peaks (Table 2) were identified when the 
cutoff was set to P=/e-05 (default). With the use of the 
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Bayesian model ( Pr(£ reads 7 9) € 0.01), we identified 
7 539 peaks, of which 7 518 (99.7%) were overlapped 
with the MACS model with no controls, indicating a 
high accuracy for the Bayesian model. To avoid the 
potentially high false negative rate resulting from the 
stringent cutoff applied, we also tested a relaxed cutoff 
for the Bayesian model ( Pr(#reads > 4) € 0.05, and 
we identified a total of 11,536 peaks. Compared with the 
MACS model using different cutoffs, the relaxed 
Bayesian model remained high overlapping rates with 
the MACS data P=/e-08, again indicating the 
effectiveness of PCWC using our Bayesian model. 


Table 2 Comparison of peak callings between Bayesian 
and MACS models 





Options MACS Bayesian Overlap Overlapping (%) 
le-05 14 346 11312 98.06 
1e-06 12 627 11 116 96.36 
le-07 11 663 10 827 93.85 
1e-08 10 841 10 432 90.43 
1e-09 10 235 10 020 86.86 


For the Bayesian model, the cutoff was set to Pr(#reads > 4) € 0.05 . 


As the likelihood function used in the Bayesian 
model is subject to simulation for optimizing the number 
of reads a, we next addressed whether the simulation 
method based on 10 000 bootstrapping per chromosome 
with w=50 bp random bins would be biased in our 
method. Firstly, we evaluated the effects of number of 
bootstrappings (ranging from 1000 to 50000), and no 
bias was observed when analyzing the H3K4me3 ChIP- 
seq data (Table 3). Similar results were obtained when 
using the VDR ChIP-seq data (data not shown). We then 
assessed the effects of random bin size (w) and found no 
bias there either, suggesting that the simulation method 
itself would not generate bias for the parameters a and c 
(Figure 4). 


Table 3 Effects of bootstrapping on the cutoff threshold 





#Reads 1 000 bootstrap 5 000 bootstrap 10 000 bootstrap 20 000 bootstrap 50 000 bootstrap 
1 0.148 0.148 0.144 0.141 0.144 
2 0.077 0.083 0.084 0.076 0.081 
3 0.052 0.060 0.062 0.052 0.056 
4 0.037 0.040 0.048 0.038 0.042 
5 0.027 0.028 0.036 0.029 0.031 
6 0.023 0.021 0.026 0.021 0.023 
7 0.017 0.016 0.018 0.015 0.016 
8 0.012 0.011 0.013 0.010 0.011 
[a 0.008 0.008 0.008 0.007 0.007 
10 0.004 0.005 0.007 0.004 0.005 
11 0.002 0.004 0.004 0.003 0.004 
13 0.000 0.002 0.003 0.001 0.002 


*: bolded row indicates that bootstraps are not biased. 
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Figure 4 Effects of random bin size on the number of reads a 


E2F1 data 

For the E2F1 ChIP-seq data, a total of 1.188x10" 
reads (~39.0% of total reads) were uniquely mapped onto 
the genome. Using the PCWC ( Pr(£reads 7 9) € 0.01), 
we identified a total of 11 470 peaks, of which the 
majority (75.3%, n=8 639) span across the annotated 
Refseq TSSs, consistent with Chen et al's (2008) report 
that about ~50% of all genes are regulated by E2F1. Our 
analysis also indicated E2F1 binding regions were very 
close to TSS (Figure S3), consistent with the previous 
ChIP-seq and ChIP-chip results (Chen et al, 2008; Xu et 
al, 2007). The 5 genes (Figure S4) experimentally 
verified to be positively regulated by E2F1, Cdc6 (Yan 
et al, 1998), Ccnel, Ccna2 (DeGregori et al, 1995), 
Mcm4 and Mcm7 (Arata et al, 2000), were all identified 
in the E2F1- regulated regions (peaks). 


Other data 

Using the Illumina platform of ChIP-seq data, we 
demonstrated the effectiveness and accuracy of PCWC. 
To further test the effectiveness of the Bayesian model, 
we used two ABI SOLID platform data sets. The raw 
colorspace-formatted data were aligned using Bowtie 
software (Langmead et al, 2009) with maximal 2 
mismatches and uniquely mapped reads were used for 
further analysis. 

For the early growth response gene 1 (EGR1) ChIP- 
seq data (Tang et al, 2010), we identified a total of 7 302 
peaks. The KEGG pathway annotation result indicates 
that genes regulated by EGRI are significantly enriched 
in the MAPK, the Wnt and the TGF-beta signaling 
pathways (hsa04010, hsa04310, hsa04350, respectively), 
congruent with earlier reported data (Tang et al, 2010). 

For the MNase-seq data in two replicates (Valouev 
et al, 2011), a total of 533401 and 514135 peaks were 
identified, respectively, where 83.12% overlapped 
between the two replicates, further supporting the 
effectiveness of peak calling with no control. 


In silico data 
To demonstrate the effectiveness of ChIP-seq 
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PCWC, we in silico generated 1x10" 36-bp reads using 
simreads of the Rmap package (Smith et al, 2009) to 
produce a simulated control sample. As expected, only 
428 peaks were generated based on the Bayesian model. 
Notably, the number of peaks called from the in silico 
data is equivalent with the two independent controls 
(Table 1), suggesting that the PCWC is sensitive. 


CONCLUSIONS 

In this study, we demonstrated an informative 
analysis of ChIP-seq PCWC based on the Bayesian 
framework approach. By the application to multiple 
ChIP-seq data sets, we have demonstrated the 
effectiveness of our method for both transcriptional 
factor and chromatin modification ChIP-seq experiments. 
Notably, the demonstration PCWCOC's effectiveness 
showed no superiority over the method with controls. 
The purpose of this study is to demonstrate a potential 
alternative strategy for designing ChIP-seq experiments 
and identifying transcriptional factor binding sequences 
without using controls. 

For ChIP-chip analysis, the tiling array may suffer 
from cross-hybridizations, therefore, the fold ratio for 
peak calling relative to background is required. By 
contrast, our analyses indicated the effectiveness of 
ChIP-seq PCWC, likely due to the finer resolution and 
greater signal-to-noise ratio of the ChIP-seq data 
(Rozowsky et al, 2009). Meanwhile, as opposed to the 
Poisson-based model for ChIP-seq peak calling, our 
Bayesian model is dependent on genome-wide scanning 
to empirically measure the background distribution, 
resulting in the higher selectivity and lower FDRs for the 
PCWC. 

Contrary to RNA-seq, ChIP-seq is more confined 
by relatively complicated pre-ChIP experiments, e.g., 
antibody specificity, large amount of cells (usually no 
less than 1x10’ cells), formaldehyde cross-link and 
supersonic shearing, etc. If the ChIP experiments ensure 
high antibody specificity, the following high-throughput 
sequencing with deep coverage to identify peaks without 
control is both plausible and robust. While depth-of- 
sequencing issues (Kharchenko et al, 2008) exist in the 
ChIP-seq experiments, our analyses suggest that no less 
than 10 million effective reads (uniquely mapped) are 
necessary for PCWC. 

Moreover, improvements of ChIP-seq experimental 
strategies without (or with) control data in an appropriate 
way may be preferable. Based on our survey, two 
biologically independent replicates is highly replicable (s 
shown in Table 1) as previously suggested (Park, 2009). 
Therefore, two replicates of ChIP-seq would be sufficient 
for peak calling. Meanwhile, if PCWC is determined, the 
integration with other data types will be essential. For 
example, the integration of ChIP-seq data with gene 
expression data (including microarray gene expression 
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data or RNA-seq data) may maximize the interpretation 
of gene regulatory network. 


Abbreviations: ChIP-seq: chromatin 
immunoprecipitation followed by sequencing; NGS: next 
generation sequencing; ChIP-chip: chromatin 


immunoprecipitation followed by tiling microarray; 
MACS: model-based analysis of ChIP-seq; PCWC: peak 
calling without controls; FDR: false discovery rates; GO: 
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Zoological Research call for papers 


The 2" and 3"! issue, 2013, “Special Issue on Amphibians and Reptiles” 


and *Special Issue on Fish" 


To whom it may concern: 


We sincerely welcome any relative contributions from you and your research teams, following 


are the details. 


1) Contribution scope and requirements 

The relative research on amphibians, reptiles and fish, which including integrative biology, 
molecular biology, genetics, biochemistry, physiology, evolution biology, reproductive biology, etc. 
The researches should focus on: the investigation of species diversity and distribution pattern and 
formation mechanism; ecology; behavior and evolution; coevolution between the adaptation to the 
environment and the morphology and function; resources and protection. Both submissions in 
English and Chinese are acceptable, while manuscripts in English are preferable. For reviews and 
reports, they should be limited within 10,000 words, and for research articles, there's no strict word 


limitations. 
2) Contact us 


You can send your articles by E-mail (zoores@mail.kiz.ac.cn) or use our online submission 
system (http://www.zoores.ac.cn/). Please label “submission for the special issue" in front of your 


word (.doc) file. 
Deadline: 31", Jan. 2013 for Special Issue on Amphibians and Reptiles; 
28". Feb. 2013 for Special Issue on Fish. 


We appreciate your support, thank you! 
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